Autoencoder-Based Incremental Class Learning without Retraining on Old Data

Euntae Choi; Kiyoung Choi; Kyungmi Lee

arxiv: 1907.07872 · v1 · pith:PMK7AULXnew · submitted 2019-07-18 · 💻 cs.LG · stat.ML

Autoencoder-Based Incremental Class Learning without Retraining on Old Data

Euntae Choi , Kyungmi Lee , Kiyoung Choi This is my paper

Pith reviewed 2026-05-24 19:48 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords incremental class learningcontinual learningautoencoderprototypesmetric classificationcatastrophic forgettingCIFAR-100CUB-200-2011

0 comments

The pith

Storing only the mean prototype per class from an autoencoder allows incremental class learning without old data or high memory costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an incremental class learning approach in which an autoencoder extracts prototypes from inputs and only the mean of those prototypes is retained for each class. Classification then proceeds by metric comparison to these stored means rather than by expanding a softmax output layer. When a new task arrives, regularization is applied to the model to limit forgetting of earlier classes while no raw previous data is kept or replayed. On the CIFAR-100 and CUB-200-2011 benchmarks the resulting accuracy matches that of rehearsal-based or generative methods yet requires far less additional memory. The work therefore claims that prototype means suffice as a compact representation for sequential disjoint-class learning.

Core claim

An autoencoder can be trained so that its latent representations serve as prototypes; retaining only the per-class mean of these prototypes permits metric-based classification across sequentially presented disjoint classes, and regularization at each new task prevents catastrophic forgetting without any access to prior raw examples or generative reconstruction of them.

What carries the argument

Autoencoder that maps inputs to prototypes whose class-wise means are stored for nearest-prototype metric classification, with task-wise regularization.

If this is right

Output layer size stays constant because new classes do not require new units.
Memory overhead grows only linearly with the number of classes and the prototype dimension rather than with the size of stored exemplars.
Rehearsal buffers and generative replay networks become unnecessary for this class-incremental setting.
Regularization alone, when paired with fixed prototype means, is claimed to be enough to control forgetting.
The same prototype-mean storage can be used for both classification and any downstream metric task without retraining the encoder on old data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may scale to settings where storage is strictly limited, such as embedded devices that must learn new object categories over time.
If the mean prototype continues to work when the number of classes grows into the thousands, the method would imply that class-conditional statistics in latent space are unusually stable.
Combining the stored means with a small set of synthetic or distilled examples could be tested as a low-cost way to recover any lost accuracy.
The same prototype storage could support open-set recognition by flagging inputs whose distance to all stored means exceeds a threshold.

Load-bearing premise

The mean of the autoencoder prototypes for a class remains a sufficient statistic for correct metric classification of future inputs even after many new classes have been added and without any access to the original training images.

What would settle it

A controlled run on CIFAR-100 in which the method, after learning all classes sequentially, yields accuracy more than a few points below the rehearsal-based state-of-the-art baseline while still using only the reported memory budget.

Figures

Figures reproduced from arXiv: 1907.07872 by Euntae Choi, Kiyoung Choi, Kyungmi Lee.

**Figure 1.** Figure 1: Overall architecture of our model. experiment on (CIFAR-100 and CUB-200), a VGG-19 [Simonyan and Zisserman, 2014] pretrained on ImageNet is attached in front of the encoder and works as a fixed feature extractor ϕ. This is similar to FearNet, which uses a pretrained ResNet-50 [He et al., 2016] as its feature extractor. Cosine Similarity-Based Classification The cosine version of NCM is used for classifi… view at source ↗

read the original abstract

Incremental class learning, a scenario in continual learning context where classes and their training data are sequentially and disjointedly observed, challenges a problem widely known as catastrophic forgetting. In this work, we propose a novel incremental class learning method that can significantly reduce memory overhead compared to previous approaches. Apart from conventional classification scheme using softmax, our model bases on an autoencoder to extract prototypes for given inputs so that no change in its output unit is required. It stores only the mean of prototypes per class to perform metric-based classification, unlike rehearsal approaches which rely on large memory or generative model. To mitigate catastrophic forgetting, regularization methods are applied on our model when a new task is encountered. We evaluate our method by experimenting on CIFAR-100 and CUB-200-2011 and show that its performance is comparable to the state-of-the-art method with much lower additional memory cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tries to cut memory in incremental class learning by storing only the mean of autoencoder prototypes per class and using metric classification plus regularization, but the abstract supplies no numbers or checks on whether those means stay useful after later tasks.

read the letter

The main point is a memory-saving setup for continual learning: train an autoencoder to get prototypes, keep only the per-class mean in latent space, classify by nearest mean, and apply regularization on new tasks so the encoder does not erase old knowledge. No old raw data or generative replay is needed, and the output layer does not grow with new classes. That is the concrete difference from standard rehearsal or softmax-based incremental methods. On CIFAR-100 and CUB-200-2011 the abstract claims performance comparable to the state of the art at much lower added memory cost. The idea is simple and directly targets the practical constraint of limited storage on edge devices. Using an autoencoder for prototypes rather than raw features or a classifier head is a reasonable choice that avoids some of the usual output-layer headaches. The regularization step is off-the-shelf, which keeps the method easy to implement. The central risk is exactly the one in the stress-test note. If the encoder parameters shift on new data, even with regularization, the stored means may no longer sit near the current latent points of old classes. The abstract gives no latent-space plots, no intra-class variance numbers, and no ablation that replaces means with full prototype sets or checks reconstruction quality. Without those checks it is impossible to know whether the mean remains a sufficient statistic. The abstract also omits any quantitative results, baseline names, or statistical details, so the performance claim cannot be evaluated from what is shown. This work is aimed at researchers already working on memory-constrained continual learning who want a lightweight prototype alternative. A reader looking for a clean, low-overhead baseline might get value from the full paper if the experiments are solid. The idea is coherent on its own terms and engages the literature on forgetting, so it deserves a serious referee even though the current evidence is thin.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an autoencoder-based incremental class learning method that extracts prototypes from inputs, stores only the per-class mean of these prototypes, and performs nearest-mean metric classification without retraining on old data or using rehearsal buffers. Regularization is applied when new tasks arrive to mitigate catastrophic forgetting. Experiments on CIFAR-100 and CUB-200-2011 are reported to achieve performance comparable to state-of-the-art methods at substantially lower additional memory cost.

Significance. If the empirical results hold under the stated assumptions, the method supplies a simple, low-memory alternative to rehearsal and generative-model approaches in class-incremental learning by combining an autoencoder with off-the-shelf regularization and prototype means. This could be practically relevant for memory-constrained continual-learning settings.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): the central claim of 'comparable performance' to SOTA is asserted without any reported accuracy numbers, baseline methods, statistical tests, or ablation results in the abstract; the full experimental section must supply these quantitative details and controls to substantiate the claim.
[§3] §3 (Method): the claim that storing only the per-class mean of autoencoder prototypes remains a sufficient statistic for metric classification after subsequent regularization steps rests on the unverified assumptions that (a) intra-class latent distributions remain compact and unimodal and (b) the encoder does not shift old-class points far from their stored means; no latent-space visualizations, intra-class variance statistics, or ablation replacing means with full prototype sets are provided to test these conditions.

minor comments (1)

[§3] Clarify the precise regularization term (e.g., EWC, SI, or other) and the joint training objective for the autoencoder plus classifier in the incremental phase.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and strengthen our manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the central claim of 'comparable performance' to SOTA is asserted without any reported accuracy numbers, baseline methods, statistical tests, or ablation results in the abstract; the full experimental section must supply these quantitative details and controls to substantiate the claim.

Authors: We agree that the abstract would be strengthened by including specific quantitative results. The experimental section (§4) already reports accuracy numbers, baseline comparisons (including rehearsal and generative-model methods), and memory overhead on CIFAR-100 and CUB-200-2011, along with controls for the regularization approach. In the revision we will update the abstract to cite key accuracy figures and memory savings relative to SOTA, and we will ensure §4 explicitly tabulates all baselines, any statistical tests performed, and ablation results. revision: yes
Referee: [§3] §3 (Method): the claim that storing only the per-class mean of autoencoder prototypes remains a sufficient statistic for metric classification after subsequent regularization steps rests on the unverified assumptions that (a) intra-class latent distributions remain compact and unimodal and (b) the encoder does not shift old-class points far from their stored means; no latent-space visualizations, intra-class variance statistics, or ablation replacing means with full prototype sets are provided to test these conditions.

Authors: We acknowledge that the sufficiency of per-class means after regularization is an assumption that benefits from direct verification. Our regularization term is designed to penalize large shifts in the latent representations of previous classes, which we expect to keep intra-class distributions compact. To substantiate this, the revised manuscript will add (i) t-SNE or PCA visualizations of the latent space before and after new tasks, (ii) per-class variance statistics in the prototype space, and (iii) an ablation that compares nearest-mean classification against storing and using the full set of prototypes. These additions will empirically test assumptions (a) and (b). revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is empirically described without self-referential derivations

full rationale

The provided abstract and description contain no equations, derivations, or self-citations that reduce any claimed result to a fitted parameter or input by construction. The approach is presented as storing per-class prototype means from an autoencoder and applying off-the-shelf regularization, with performance evaluated empirically on CIFAR-100 and CUB-200-2011. No load-bearing steps match the enumerated circularity patterns; the central claim rests on experimental comparison rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, mathematical axioms, or newly postulated entities; the method is described in terms of standard autoencoders and regularization already present in the literature.

pith-pipeline@v0.9.0 · 5678 in / 1116 out tokens · 22287 ms · 2026-05-24T19:48:03.801036+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 8 internal anchors

[1]

Memory aware synapses: Learning what (not) to forget

[Aljundi et al., 2018] Rahaf Aljundi, Francesca Babiloni, Mo- hamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Com- puter Vision – ECCV 2018, pages 144–161,

work page 2018
[2]

Lof: identifying density-based local outliers

[Breunig et al., 2000] Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and J¨org Sander. Lof: identifying density-based local outliers. In ACM sigmod record, volume 29, pages 93–104. ACM,

work page 2000
[3]

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

[Clevert et al., 2015] Djork-Arn´e Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289,

work page internal anchor Pith review Pith/arXiv arXiv 2015
[4]

Catastrophic forgetting in con- nectionist networks

[French, 1999] Robert M French. Catastrophic forgetting in con- nectionist networks. Trends in cognitive sciences, 3(4):128–135,

work page 1999
[5]

A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems

[Gepperth and Karaoguz, 2016] Alexander Gepperth and Cem Karaoguz. A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems. Cognitive Computation, 8:924 – 934,

work page 2016
[6]

Deepncm: Deep nearest class mean classi- ﬁers

[Guerriero et al., 2018] Samantha Guerriero, Barbara Caputo, and Thomas Mensink. Deepncm: Deep nearest class mean classi- ﬁers. In International Conference on Learning Representations, Workshop Track,

work page 2018
[7]

Deep residual learning for image recognition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Com- puter Vision and Pattern Recognition (CVPR) , pages 770–778. IEEE,

work page 2016
[8]

Distilling the knowledge in a neural network

[Hinton et al., 2015] Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop,

work page 2015
[9]

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

[Hsu et al., 2018] Yen-Chang Hsu, Yen-Cheng Liu, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488 ,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Fearnet: Brain-inspired model for incremental learning

[Kemker and Kanan, 2018] Ronald Kemker and Christopher Kanan. Fearnet: Brain-inspired model for incremental learning. In International Conference on Learning Representations,

work page 2018
[11]

Measuring catastrophic forgetting in neural networks

[Kemker et al., 2018] Ronald Kemker, Marc McClure, Angelina Abitino, Tyler L Hayes, and Christopher Kanan. Measuring catastrophic forgetting in neural networks. In Thirty-Second AAAI Conference on Artiﬁcial Intelligence,

work page 2018
[12]

Overcoming catastrophic forgetting in neural networks

[Kirkpatrick et al., 2017] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, page 201611835,

work page 2017
[13]

Learning multiple layers of features from tiny images

[Krizhevsky and Hinton, 2009] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer,

work page 2009
[14]

What learning systems do intelligent agents need? complementary learning systems theory updated

[Kumaran et al., 2016] Dharshan Kumaran, Demis Hassabis, and James L McClelland. What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in cognitive sciences, 20(7):512–534,

work page 2016
[15]

Learning without forgetting

[Li and Hoiem, 2018] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947,

work page 2018
[16]

Gradient episodic memory for continual learning

[Lopez-Paz and Ranzato, 2017] David Lopez-Paz and Marc Aure- lio Ranzato. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems 30, pages 6467–6476. Curran Associates, Inc.,

work page 2017
[17]

Continuous Learning in Single-Incremental-Task Scenarios

[Maltoni and Lomonaco, 2018] Davide Maltoni and Vincenzo Lomonaco. Continuous learning in single-incremental-task scenarios. arXiv preprint arXiv:1806.08568,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Why there are complemen- tary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learn- ing and memory

[McClelland et al., 1995] James L McClelland, Bruce L Mc- Naughton, and Randall C O’reilly. Why there are complemen- tary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learn- ing and memory. Psychological review, 102(3):419,

work page 1995
[19]

Catastrophic interference in connectionist networks: The sequential learning problem

[McCloskey and Cohen, 1989] Michael McCloskey and Neal J Co- hen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and mo- tivation, volume 24, pages 109–165

work page 1989
[20]

Mensink, J

[Mensink et al., 2013] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Distance-based image classiﬁcation: Generalizing to new classes at near-zero cost. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 35(11):2624–2637, Nov

work page 2013
[21]

Continual Lifelong Learning with Neural Networks: A Review

[Parisi et al., 2018] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. arXiv preprint arXiv:1802.07569,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Semi-supervised learning with ladder networks

[Rasmus et al., 2015] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In Advances in Neural Informa- tion Processing Systems, pages 3546–3554,

work page 2015
[23]

[Rebufﬁet al., 2017] Sylvestre-Alvise Rebufﬁ, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classiﬁer and representation learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July

work page 2017
[24]

Reddi, Satyen Kale, and Sanjiv Ku- mar

[Reddi et al., 2018] Sashank J. Reddi, Satyen Kale, and Sanjiv Ku- mar. On the convergence of adam and beyond. In International Conference on Learning Representations,

work page 2018
[25]

Overcoming catastrophic forgetting with hard attention to the task

[Serra et al., 2018] Joan Serra, D ´ıdac Sur ´ıs, Marius Miron, and Alexandros Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. arXiv preprint arXiv:1801.01423,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Very Deep Convolutional Networks for Large-Scale Image Recognition

[Simonyan and Zisserman, 2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale im- age recognition. arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[27]

[Wah et al., 2011] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Tech- nical Report CNS-TR-2011-001, California Institute of Technol- ogy,

work page 2011
[28]

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition

[Wang et al., 2018] Shuai Wang, Zili Huang, Yanmin Qian, and Kai Yu. Deep discriminant analysis for i-vector based robust speaker recognition. arXiv preprint arXiv:1805.01344,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Incremental Classifier Learning with Generative Adversarial Networks

[Wu et al., 2018] Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Zhengyou Zhang, and Yun Fu. Incremental classiﬁer learning with generative adversarial networks. arXiv preprint arXiv:1802.00853,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Continual learning through synaptic intelligence

[Zenke et al., 2017] Friedemann Zenke, Ben Poole, and Surya Gan- guli. Continual learning through synaptic intelligence. In Pro- ceedings of the 34th International Conference on Machine Learn- ing, volume 70 of Proceedings of Machine Learning Research , pages 3987–3995. PMLR, 06–11 Aug 2017

work page 2017

[1] [1]

Memory aware synapses: Learning what (not) to forget

[Aljundi et al., 2018] Rahaf Aljundi, Francesca Babiloni, Mo- hamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. Memory aware synapses: Learning what (not) to forget. In Com- puter Vision – ECCV 2018, pages 144–161,

work page 2018

[2] [2]

Lof: identifying density-based local outliers

[Breunig et al., 2000] Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and J¨org Sander. Lof: identifying density-based local outliers. In ACM sigmod record, volume 29, pages 93–104. ACM,

work page 2000

[3] [3]

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

[Clevert et al., 2015] Djork-Arn´e Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289,

work page internal anchor Pith review Pith/arXiv arXiv 2015

[4] [4]

Catastrophic forgetting in con- nectionist networks

[French, 1999] Robert M French. Catastrophic forgetting in con- nectionist networks. Trends in cognitive sciences, 3(4):128–135,

work page 1999

[5] [5]

A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems

[Gepperth and Karaoguz, 2016] Alexander Gepperth and Cem Karaoguz. A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems. Cognitive Computation, 8:924 – 934,

work page 2016

[6] [6]

Deepncm: Deep nearest class mean classi- ﬁers

[Guerriero et al., 2018] Samantha Guerriero, Barbara Caputo, and Thomas Mensink. Deepncm: Deep nearest class mean classi- ﬁers. In International Conference on Learning Representations, Workshop Track,

work page 2018

[7] [7]

Deep residual learning for image recognition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Com- puter Vision and Pattern Recognition (CVPR) , pages 770–778. IEEE,

work page 2016

[8] [8]

Distilling the knowledge in a neural network

[Hinton et al., 2015] Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop,

work page 2015

[9] [9]

Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

[Hsu et al., 2018] Yen-Chang Hsu, Yen-Cheng Liu, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488 ,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Fearnet: Brain-inspired model for incremental learning

[Kemker and Kanan, 2018] Ronald Kemker and Christopher Kanan. Fearnet: Brain-inspired model for incremental learning. In International Conference on Learning Representations,

work page 2018

[11] [11]

Measuring catastrophic forgetting in neural networks

[Kemker et al., 2018] Ronald Kemker, Marc McClure, Angelina Abitino, Tyler L Hayes, and Christopher Kanan. Measuring catastrophic forgetting in neural networks. In Thirty-Second AAAI Conference on Artiﬁcial Intelligence,

work page 2018

[12] [12]

Overcoming catastrophic forgetting in neural networks

[Kirkpatrick et al., 2017] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, page 201611835,

work page 2017

[13] [13]

Learning multiple layers of features from tiny images

[Krizhevsky and Hinton, 2009] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer,

work page 2009

[14] [14]

What learning systems do intelligent agents need? complementary learning systems theory updated

[Kumaran et al., 2016] Dharshan Kumaran, Demis Hassabis, and James L McClelland. What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in cognitive sciences, 20(7):512–534,

work page 2016

[15] [15]

Learning without forgetting

[Li and Hoiem, 2018] Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947,

work page 2018

[16] [16]

Gradient episodic memory for continual learning

[Lopez-Paz and Ranzato, 2017] David Lopez-Paz and Marc Aure- lio Ranzato. Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems 30, pages 6467–6476. Curran Associates, Inc.,

work page 2017

[17] [17]

Continuous Learning in Single-Incremental-Task Scenarios

[Maltoni and Lomonaco, 2018] Davide Maltoni and Vincenzo Lomonaco. Continuous learning in single-incremental-task scenarios. arXiv preprint arXiv:1806.08568,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Why there are complemen- tary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learn- ing and memory

[McClelland et al., 1995] James L McClelland, Bruce L Mc- Naughton, and Randall C O’reilly. Why there are complemen- tary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learn- ing and memory. Psychological review, 102(3):419,

work page 1995

[19] [19]

Catastrophic interference in connectionist networks: The sequential learning problem

[McCloskey and Cohen, 1989] Michael McCloskey and Neal J Co- hen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and mo- tivation, volume 24, pages 109–165

work page 1989

[20] [20]

Mensink, J

[Mensink et al., 2013] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Distance-based image classiﬁcation: Generalizing to new classes at near-zero cost. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 35(11):2624–2637, Nov

work page 2013

[21] [21]

Continual Lifelong Learning with Neural Networks: A Review

[Parisi et al., 2018] German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review. arXiv preprint arXiv:1802.07569,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Semi-supervised learning with ladder networks

[Rasmus et al., 2015] Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. Semi-supervised learning with ladder networks. In Advances in Neural Informa- tion Processing Systems, pages 3546–3554,

work page 2015

[23] [23]

[Rebufﬁet al., 2017] Sylvestre-Alvise Rebufﬁ, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classiﬁer and representation learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July

work page 2017

[24] [24]

Reddi, Satyen Kale, and Sanjiv Ku- mar

[Reddi et al., 2018] Sashank J. Reddi, Satyen Kale, and Sanjiv Ku- mar. On the convergence of adam and beyond. In International Conference on Learning Representations,

work page 2018

[25] [25]

Overcoming catastrophic forgetting with hard attention to the task

[Serra et al., 2018] Joan Serra, D ´ıdac Sur ´ıs, Marius Miron, and Alexandros Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task. arXiv preprint arXiv:1801.01423,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Very Deep Convolutional Networks for Large-Scale Image Recognition

[Simonyan and Zisserman, 2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale im- age recognition. arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[27] [27]

[Wah et al., 2011] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Tech- nical Report CNS-TR-2011-001, California Institute of Technol- ogy,

work page 2011

[28] [28]

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition

[Wang et al., 2018] Shuai Wang, Zili Huang, Yanmin Qian, and Kai Yu. Deep discriminant analysis for i-vector based robust speaker recognition. arXiv preprint arXiv:1805.01344,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Incremental Classifier Learning with Generative Adversarial Networks

[Wu et al., 2018] Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Zhengyou Zhang, and Yun Fu. Incremental classiﬁer learning with generative adversarial networks. arXiv preprint arXiv:1802.00853,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Continual learning through synaptic intelligence

[Zenke et al., 2017] Friedemann Zenke, Ben Poole, and Surya Gan- guli. Continual learning through synaptic intelligence. In Pro- ceedings of the 34th International Conference on Machine Learn- ing, volume 70 of Proceedings of Machine Learning Research , pages 3987–3995. PMLR, 06–11 Aug 2017

work page 2017