pith. sign in

arxiv: 1907.02788 · v1 · pith:PLHLKHMTnew · submitted 2019-07-05 · 💻 cs.LG · cs.CV

Incremental Concept Learning via Online Generative Memory Recall

Pith reviewed 2026-05-25 01:58 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords class incremental learningcatastrophic forgettingpseudo-rehearsalconditional GANcontinual learninggenerative memory recallconcept contrastive loss
0
0 comments X

The pith

A conditional GAN generates pseudo-samples of old concepts to prevent catastrophic forgetting during incremental class learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve catastrophic forgetting in neural networks by enabling them to learn new concepts from streaming data without access to past examples. It does this by training a conditional generative adversarial network on limited old data to produce pseudo-samples that stand in for previous concepts. These generated samples are then recalled in a balanced way while new classes are learned, and a contrastive loss keeps weight changes from erasing old knowledge. A sympathetic reader would care because this removes the need to store all past data, opening a path to networks that accumulate concepts over time like lifelong learners.

Core claim

The central claim is that a conditional GAN can consolidate memory of old concepts by generating pseudo-samples, which are then used in a balanced online recall strategy together with a concept contrastive loss; this combination allows a neural network to learn new classes incrementally on MNIST, Fashion-MNIST, and SVHN while keeping performance on earlier classes high.

What carries the argument

The conditional generative adversarial network that produces pseudo-samples of past classes, combined with the balanced online memory recall strategy and the concept contrastive loss that limits weight drift.

If this is right

  • Networks can add new classes sequentially while retaining accuracy on all previous classes without storing the original training data.
  • The balanced recall strategy keeps the influence of old and new classes roughly equal during each training step.
  • The concept contrastive loss reduces the magnitude of weight updates that would otherwise overwrite earlier concepts.
  • The method shows measurable gains over other rehearsal baselines on the three evaluated image datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If better conditional GANs become available, the same rehearsal idea could be tested on higher-resolution or more diverse image sets where distribution matching is harder.
  • The pseudo-sample approach could be combined with parameter-isolation methods to further reduce interference between tasks.
  • Memory savings from not storing raw past data would become more valuable as the number of incremental steps grows.

Load-bearing premise

The conditional GAN, trained only on the small set of past data available at each step, produces pseudo-samples whose distribution is close enough to the true old data that rehearsal on them stops forgetting without creating new biases.

What would settle it

Measure accuracy on old classes after incremental training using only the GAN-generated samples versus using the real old samples; if the gap is large and forgetting remains severe with the generated samples, the approach fails.

Figures

Figures reproduced from arXiv: 1907.02788 by Bao-Gang Hu, Huaiyu Li, Weiming Dong.

Figure 1
Figure 1. Figure 1: The ICLNet architecture for class incremental learning tasks. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The 2-D features visualization of currently learned classes during training on class incremental learning tasks. (a).Fully connected classifier trained with [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The recalled samples from our RecallNet during learning 5 class incremental tasks on datasets MNIST, Fashion-MNIST and SVHN. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The average incremental accuracy for different methods during learning 5 class incremental tasks on datasets (a).MNIST, (b).Fashion-MNIST, (c).SVHN [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The recall samples from RecallNet after learning four class incremental [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The average incremental accuracy for different methods during [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The average incremental accuracy for comparing ICLNet with iCaRL [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

The ability to learn more and more concepts over time from incrementally arriving data is essential for the development of a life-long learning system. However, deep neural networks often suffer from forgetting previously learned concepts when continually learning new concepts, which is known as catastrophic forgetting problem. The main reason for catastrophic forgetting is that the past concept data is not available and neural weights are changed during incrementally learning new concepts. In this paper, we propose a pseudo-rehearsal based class incremental learning approach to make neural networks capable of continually learning new concepts. We use a conditional generative adversarial network to consolidate old concepts memory and recall pseudo samples during learning new concepts and a balanced online memory recall strategy is to maximally maintain old memories. And we design a comprehensible incremental concept learning network as well as a concept contrastive loss to alleviate the magnitude of neural weights change. We evaluate the proposed approach on MNIST, Fashion-MNIST and SVHN datasets and compare with other rehearsal based approaches. The extensive experiments demonstrate the effectiveness of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a pseudo-rehearsal approach for class-incremental learning that trains a conditional GAN on past data to generate pseudo-samples, employs a balanced online memory recall strategy during new-concept training, and introduces a concept contrastive loss within an incremental concept learning network to reduce catastrophic forgetting. Effectiveness is demonstrated via comparative experiments against other rehearsal-based methods on the MNIST, Fashion-MNIST, and SVHN datasets.

Significance. If the cGAN pseudo-samples faithfully approximate old-concept distributions, the method would provide a storage-efficient alternative to exemplar rehearsal for continual learning. The empirical comparisons on three standard benchmarks constitute a concrete contribution, but the absence of direct fidelity metrics or ablations leaves the practical significance dependent on unverified assumptions about sample quality.

major comments (2)
  1. [Experiments] The central claim (abstract and §3) that rehearsal on cGAN pseudo-samples prevents forgetting requires the generated distribution to remain close to the true old-concept distribution. No quantitative validation—such as FID scores, MMD distances, or per-increment distribution-shift measurements—is reported in the experimental section, leaving the key assumption untested on the evaluated datasets.
  2. [Method and Experiments] The balanced online memory recall strategy and concept contrastive loss each introduce a free weighting parameter (memory recall balance ratio; concept contrastive loss weighting coefficient). No sensitivity analysis or ablation removing either component is provided, so it is unclear whether reported gains are robust or attributable to these specific mechanisms rather than the base rehearsal setup.
minor comments (2)
  1. [Abstract] The abstract contains a sentence fragment beginning with a capitalized 'And'; this should be rephrased for grammatical consistency.
  2. [Method] Notation for the conditional GAN generator and the incremental network should be unified across equations to avoid ambiguity between G and the classifier f.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experiments] The central claim (abstract and §3) that rehearsal on cGAN pseudo-samples prevents forgetting requires the generated distribution to remain close to the true old-concept distribution. No quantitative validation—such as FID scores, MMD distances, or per-increment distribution-shift measurements—is reported in the experimental section, leaving the key assumption untested on the evaluated datasets.

    Authors: We agree that direct quantitative validation of cGAN sample fidelity would strengthen support for the central claim. In the revised manuscript we will report FID scores (and MMD where space permits) between generated pseudo-samples and held-out real samples from prior concepts at each incremental step on all three datasets. This addition will test the distribution-approximation assumption explicitly. revision: yes

  2. Referee: [Method and Experiments] The balanced online memory recall strategy and concept contrastive loss each introduce a free weighting parameter (memory recall balance ratio; concept contrastive loss weighting coefficient). No sensitivity analysis or ablation removing either component is provided, so it is unclear whether reported gains are robust or attributable to these specific mechanisms rather than the base rehearsal setup.

    Authors: We acknowledge that sensitivity analyses and component ablations are needed to isolate the contributions of the balance ratio and contrastive-loss weight. The revision will add (i) an ablation comparing the full model against variants that disable balanced recall and that disable the contrastive loss, and (ii) performance curves over a range of values for each hyper-parameter. These results will demonstrate robustness beyond the base rehearsal setup. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with no load-bearing self-citations or fitted predictions

full rationale

The paper describes a pseudo-rehearsal method using a conditional GAN to generate samples for rehearsal, a balanced recall strategy, and a concept contrastive loss, then reports empirical results on MNIST-scale datasets. No derivation chain, equations, or self-cited uniqueness theorems appear in the text; the central claim rests on experimental comparisons rather than any step that reduces by construction to its own inputs or prior author work. This is the expected non-finding for an applied incremental-learning paper.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach depends on the generative model being able to stand in for real past data and on the contrastive loss sufficiently constraining weight drift; both are domain assumptions rather than derived quantities.

free parameters (2)
  • memory recall balance ratio
    Controls proportion of generated old samples versus new samples at each training step; chosen to maintain old memories.
  • concept contrastive loss weighting coefficient
    Scales the contribution of the contrastive term relative to classification loss; tuned to limit weight change magnitude.
axioms (1)
  • domain assumption A conditional GAN trained on past data can produce samples whose statistics are sufficiently close to the original concept distributions for effective rehearsal.
    Invoked when the paper states that the cGAN consolidates old concept memory and that pseudo samples are recalled during new-concept learning.

pith-pipeline@v0.9.0 · 5701 in / 1351 out tokens · 59744 ms · 2026-05-25T01:58:45.599551+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 16 internal anchors

  1. [1]

    Continual lifelong learning with neural networks: A review,

    G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,” Neural Networks, vol. 113, pp. 54–71, 2019

  2. [2]

    Catastrophic interference in connec- tionist networks: The sequential learning problem,

    M. McCloskey and N. J. Cohen, “Catastrophic interference in connec- tionist networks: The sequential learning problem,” in Psychology of learning and motivation . Elsevier, 1989, vol. 24, pp. 109–165

  3. [3]

    Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory

    J. L. McClelland, B. L. McNaughton, and R. C. O’reilly, “Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.” Psychological review, vol. 102, no. 3, p. 419, 1995

  4. [4]

    Memory and brain,

    L. R. Squire, “Memory and brain,” 1987

  5. [5]

    Neural plasticity across the lifespan,

    J. D. Power and B. L. Schlaggar, “Neural plasticity across the lifespan,” Wiley Interdisciplinary Reviews: Developmental Biology , vol. 6, no. 1, p. e216, 2017

  6. [6]

    The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects,

    M. Mermillod, A. Bugaiska, and P. Bonin, “The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects,” Frontiers in psychology , vol. 4, p. 504, 2013

  7. [7]

    Continual learning with deep generative replay,

    H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in Advances in Neural Information Processing Systems, 2017, pp. 2990–2999

  8. [8]

    Deep Generative Dual Memory Network for Continual Learning

    N. Kamra, U. Gupta, and Y . Liu, “Deep generative dual memory network for continual learning,” arXiv preprint arXiv:1710.10368 , 2017

  9. [9]

    Incremental Classifier Learning with Generative Adversarial Networks

    Y . Wu, Y . Chen, L. Wang, Y . Ye, Z. Liu, Y . Guo, Z. Zhang, and Y . Fu, “Incremental classifier learning with generative adversarial networks,” arXiv preprint arXiv:1802.00853 , 2018

  10. [10]

    Neural Turing Machines

    A. Graves, G. Wayne, and I. Danihelka, “Neural turing machines,” arXiv preprint arXiv:1410.5401, 2014

  11. [11]

    End-to-end memory net- works,

    S. Sukhbaatar, J. Weston, R. Fergus et al. , “End-to-end memory net- works,” in Advances in neural information processing systems , 2015, pp. 2440–2448. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XX MONTH 2019 11

  12. [12]

    Recurrent neural networks with auxiliary memory units,

    J. Wang, L. Zhang, Q. Guo, and Z. Yi, “Recurrent neural networks with auxiliary memory units,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 5, pp. 1652–1661, May 2018

  13. [13]

    Auto-Encoding Variational Bayes

    D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013

  14. [14]

    Generative adversarial nets,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in neural information processing systems , 2014, pp. 2672– 2680

  15. [15]

    Cosine normalization: Using cosine similarity instead of dot product in neural networks,

    C. Luo, J. Zhan, X. Xue, L. Wang, R. Ren, and Q. Yang, “Cosine normalization: Using cosine similarity instead of dot product in neural networks,” in International Conference on Artificial Neural Networks . Springer, 2018, pp. 382–391

  16. [16]

    A discriminative feature learning approach for deep face recognition,

    Y . Wen, K. Zhang, Z. Li, and Y . Qiao, “A discriminative feature learning approach for deep face recognition,” in European conference on computer vision . Springer, 2016, pp. 499–515

  17. [17]

    The mnist database of handwritten digits,

    Y . LeCun, “The mnist database of handwritten digits,” http://yann. lecun. com/exdb/mnist/, 1998

  18. [18]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017

  19. [19]

    Reading digits in natural images with unsupervised feature learning,

    Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS workshop on deep learning and unsupervised feature learning , 2011

  20. [20]

    Memory replay gans: learning to generate images from new categories without forgetting,

    C. Wu, L. Herranz, X. Liu, Y . Wang, J. van de Weijer, and B. Raducanu, “Memory replay gans: learning to generate images from new categories without forgetting,” in Advances in Neural Information Processing Systems, 2018, pp. 5962–5972

  21. [21]

    Overcoming catastrophic forgetting in neural networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al. , “Overcoming catastrophic forgetting in neural networks,” Pro- ceedings of the national academy of sciences , p. 201611835, 2017

  22. [22]

    Overcoming catastrophic forgetting by incremental moment matching,

    S.-W. Lee, J.-H. Kim, J. Jun, J.-W. Ha, and B.-T. Zhang, “Overcoming catastrophic forgetting by incremental moment matching,” in Advances in Neural Information Processing Systems , 2017, pp. 4652–4662

  23. [23]

    Rotate your networks: Better weight consolidation and less catastrophic forgetting,

    X. Liu, M. Masana, L. Herranz, J. van de Weijer, A. M. L ´opez, and A. D. Bagdanov, “Rotate your networks: Better weight consolidation and less catastrophic forgetting,” 24th International Conference on Pattern Recognition (ICPR), pp. 2262–2268, 2018

  24. [24]

    Learning without forgetting,

    Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017

  25. [25]

    Distilling the knowledge in a neural network,

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in NIPS workshop on deep learning and unsupervised feature learning, Montreal, Canada , 2014

  26. [26]

    Less-forgetting Learning in Deep Neural Networks

    H. Jung, J. Ju, M. Jung, and J. Kim, “Less-forgetting learning in deep neural networks,” arXiv preprint arXiv:1607.00122 , 2016

  27. [27]

    icarl: Incremental classifier and representation learning,

    S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” inProc. CVPR, 2017

  28. [28]

    Class-incremental learning via deep model consolidation,

    J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and C.- C. J. Kuo, “Class-incremental learning via deep model consolidation,” arXiv preprint arXiv:1903.07864 , 2019

  29. [29]

    Progressive Neural Networks

    A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv:1606.04671 , 2016

  30. [30]

    Lifelong Learning with Dynamically Expandable Networks

    J. Lee, J. Yun, S. Hwang, and E. Yang, “Lifelong learning with dynamically expandable networks,” arXiv preprint arXiv:1708.01547 , 2017

  31. [31]

    Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting,

    R. Coop, A. Mishtal, and I. Arel, “Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting,” IEEE Transac- tions on Neural Networks and Learning Systems , vol. 24, no. 10, pp. 1623–1634, Oct 2013

  32. [32]

    Gradient episodic memory for continual learning,

    D. Lopez-Paz et al., “Gradient episodic memory for continual learning,” in Advances in Neural Information Processing Systems, 2017, pp. 6467– 6476

  33. [33]

    Generative replay with feedback connections as a general strategy for continual learning

    G. M. van de Ven and A. S. Tolias, “Generative replay with feedback connections as a general strategy for continual learning,” arXiv preprint arXiv:1809.10635, 2018

  34. [34]

    An overview of gradient descent optimization algorithms

    S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747 , 2016

  35. [35]

    Wasserstein GAN

    M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint arXiv:1701.07875, 2017

  36. [36]

    Improved training of wasserstein gans,

    I. Gulrajani, F. Ahmed, M. Arjovsky, V . Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Advances in Neural Infor- mation Processing Systems , 2017, pp. 5767–5777

  37. [37]

    Least squares generative adversarial networks,

    X. Mao, Q. Li, H. Xie, R. Y . Lau, Z. Wang, and S. P. Smolley, “Least squares generative adversarial networks,” in Computer Vision (ICCV), 2017 IEEE International Conference on . IEEE, 2017, pp. 2813–2821

  38. [38]

    Automatic differentiation in pytorch,

    A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-W, 2017

  39. [39]

    Self-Attention Generative Adversarial Networks

    H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention gen- erative adversarial networks,” arXiv preprint arXiv:1805.08318 , 2018

  40. [40]

    Spectral Normalization for Generative Adversarial Networks

    T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida, “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018

  41. [41]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014

  42. [42]

    Catastrophic forgetting, rehearsal and pseudorehearsal,

    A. Robins, “Catastrophic forgetting, rehearsal and pseudorehearsal,” Connection Science, vol. 7, no. 2, pp. 123–146, 1995

  43. [43]

    Revisiting Distillation and Incremental Classifier Learning

    K. Javed and F. Shafait, “Revisiting distillation and incremental classifier learning,” arXiv preprint arXiv:1807.02802 , 2018

  44. [44]

    Large Scale GAN Training for High Fidelity Natural Image Synthesis

    A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, 2018

  45. [45]

    High-fidelity image generation with fewer labels,

    M. Lu ˇci´c, M. Tschannen, M. Ritter, X. Zhai, O. Bachem, and S. Gelly, “High-fidelity image generation with fewer labels,” in International Conference on Machine Learning , 2019, pp. 4183–4192