pith. machine review for the scientific record. sign in

arxiv: 2603.13804 · v2 · submitted 2026-03-14 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Memory-efficient Continual Learning with Prototypical Exemplar Condensation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:42 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords continual learningcatastrophic forgettingrehearsal methodsmemory efficiencyexemplar synthesisprototypical representationsdata augmentation
0
0 comments X

The pith

Storing a small number of synthesized prototypical exemplars replaces the need for many real samples in rehearsal-based continual learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to compress memory in continual learning by synthesizing and storing prototypical exemplars rather than selecting and keeping large numbers of real past samples. These exemplars are designed so that, once passed through a feature extractor, they form representative prototypes of previous task data. A perturbation-based augmentation step generates additional synthetic variants during training to further reduce forgetting. On standard benchmarks the approach matches or exceeds existing coreset methods while using far fewer stored items per class, with the advantage growing on large datasets and sequences with many tasks. Because only synthetic items are kept, the method also avoids storing private real data.

Core claim

Prototypical exemplar condensation lets a model retain knowledge from prior tasks by storing only a handful of synthesized samples whose feature-extractor outputs serve as compact, representative prototypes; combined with perturbation augmentation at training time, this yields higher accuracy than rehearsal baselines that store many more real samples, especially when the number of tasks or the dataset scale increases.

What carries the argument

Prototypical exemplar condensation: the synthesis of a few samples per class that, after feature extraction, act as representative prototypes for replay, paired with a perturbation-based augmentation that creates synthetic training variants from them.

If this is right

  • Memory per class can drop well below the 20-sample threshold common in prior rehearsal work while accuracy stays competitive or improves.
  • Performance gains become larger as the number of sequential tasks or the size of the dataset grows.
  • No real samples from earlier tasks need to be retained, improving privacy.
  • The same condensation step can be inserted into existing rehearsal pipelines without changing the underlying model architecture.
  • Training-time augmentation via perturbations further boosts retention without extra stored data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could make continual learning feasible on memory-limited edge devices where storing even 20 real images per class is prohibitive.
  • Similar condensation might be tested in non-vision domains such as language or graph tasks where rehearsal memory is also a bottleneck.
  • If the prototypes remain effective after feature-extractor updates, the method could support longer task sequences than current rehearsal limits allow.
  • One could measure how closely the condensed exemplars match the original class-conditional distributions in feature space to quantify the compression limit.

Load-bearing premise

The synthesized prototypical exemplars, once passed through the feature extractor, preserve enough statistical properties of the original task data to prevent catastrophic forgetting.

What would settle it

Measure the forgetting rate on a multi-task benchmark such as CIFAR-100 split into 20 tasks when replay uses only the synthesized exemplars versus an equal number of real samples; a large gap in final accuracy would falsify the claim.

Figures

Figures reproduced from arXiv: 2603.13804 by Dung D. Le, Kok-Seng Wong, Le-Tuan Nguyen, Minh-Duong Nguyen, Thien-Thanh Dao.

Figure 1
Figure 1. Figure 1: Illustration of ProtoCore architecture. Given the optimized latent variable z, the corresponding synthetic image is produced by the pretrained decoder as s = g(z). This joint optimization significantly mitigates catastrophic forgetting, as the model is encouraged to se￾lectively retain the most salient and task-relevant features from past tasks while discarding trivial information. As a result, the learned… view at source ↗
Figure 2
Figure 2. Figure 2: Last accuracy on tasks observed so far in the test set of S-CIFAR-100 (10, 50 tasks), S-TinyImageNet (20 tasks), and S-ImageNet [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE visualizations of synthetic features generated by [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Rehearsal-based continual learning (CL) mitigates catastrophic forgetting by maintaining a subset of samples from previous tasks for replay. Existing studies primarily focus on optimizing memory storage through coreset selection strategies. While these methods are effective, they typically require storing a substantial number of samples per class (SPC), often exceeding 20, to maintain satisfactory performance. In this work, we propose to further compress the memory footprint by synthesizing and storing prototypical exemplars, which can form representative prototypes when passed through the feature extractor. Owing to their representative nature, these exemplars enable the model to retain previous knowledge using only a small number of samples while preserving privacy. Moreover, we introduce a perturbation-based augmentation mechanism that generates synthetic variants of previous data during training, thereby enhancing CL performance. Extensive evaluations on widely used benchmark datasets and settings demonstrate that the proposed algorithm achieves superior performance compared to existing baselines, particularly in scenarios involving large-scale datasets and a high number of tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a rehearsal-based continual learning method that synthesizes and stores prototypical exemplars to compress the memory buffer far below the typical 20 samples per class. These exemplars are passed through the current feature extractor to form representative prototypes for replay, augmented by a perturbation mechanism that generates synthetic variants of prior-task data during training. The central claim is that this yields superior performance over existing baselines on standard benchmarks, especially on large-scale datasets and in regimes with a high number of sequential tasks, while also improving privacy by avoiding storage of real samples.

Significance. If the empirical claims are substantiated, the approach could meaningfully advance memory-efficient rehearsal methods by demonstrating that a handful of synthesized exemplars can preserve prior-task statistics sufficiently to mitigate catastrophic forgetting even as the feature extractor evolves. This would be particularly valuable for privacy-sensitive or resource-limited continual-learning deployments involving dozens of tasks.

major comments (2)
  1. [§3] §3 (Method): The description of prototypical exemplar condensation does not specify how the synthesis process incorporates or compensates for subsequent updates to the feature extractor. Because the exemplars are fixed after each task while the embedding space continues to shift, it is unclear whether the stored prototypes remain aligned with the drifted features; this alignment is load-bearing for the high-task-count superiority claim.
  2. [§4] §4 (Experiments): The reported results do not include ablations that isolate the effect of exemplar count versus task count, nor do they quantify how performance degrades when the perturbation augmentation is removed in the large-scale, high-task regime. Without these controls, it is difficult to attribute the claimed gains specifically to the prototypical condensation rather than to standard rehearsal or augmentation effects.
minor comments (2)
  1. [Abstract] The abstract asserts quantitative superiority but supplies no numerical values, error bars, or SPC counts; moving at least one key result (e.g., average accuracy on the largest benchmark) into the abstract would improve readability.
  2. [§3] Notation for the perturbation augmentation (e.g., the distribution from which perturbations are drawn) is introduced without an explicit equation; adding a short formal definition would clarify reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below, clarifying the method details and committing to additional experiments where needed.

read point-by-point responses
  1. Referee: [§3] §3 (Method): The description of prototypical exemplar condensation does not specify how the synthesis process incorporates or compensates for subsequent updates to the feature extractor. Because the exemplars are fixed after each task while the embedding space continues to shift, it is unclear whether the stored prototypes remain aligned with the drifted features; this alignment is load-bearing for the high-task-count superiority claim.

    Authors: The prototypical exemplars are synthesized once per task and stored in raw input space. At replay time in later tasks, these fixed exemplars are forwarded through the current (updated) feature extractor to compute the prototypes used for distillation and rehearsal. This on-the-fly projection automatically realigns the prototypes with any drift in the embedding space. We will revise §3 to make this forward-pass mechanism explicit, including a short derivation showing that the prototype computation remains well-defined under feature evolution. revision: yes

  2. Referee: [§4] §4 (Experiments): The reported results do not include ablations that isolate the effect of exemplar count versus task count, nor do they quantify how performance degrades when the perturbation augmentation is removed in the large-scale, high-task regime. Without these controls, it is difficult to attribute the claimed gains specifically to the prototypical condensation rather than to standard rehearsal or augmentation effects.

    Authors: We agree that targeted ablations would strengthen attribution. We will add two new sets of experiments to §4: (1) performance curves varying exemplar count (1–10 SPC) across increasing task counts (5, 10, 20, 50) on the large-scale benchmarks, and (2) a direct comparison with and without the perturbation augmentation in the high-task regime, reporting the delta in average accuracy and forgetting. These results will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical proposal evaluated on benchmarks

full rationale

The paper describes an algorithmic method for synthesizing prototypical exemplars via condensation and perturbation augmentation to reduce memory in rehearsal-based continual learning. Performance claims rest on empirical comparisons against baselines across standard datasets and task counts, with no equations, derivations, or self-citation chains that reduce results to fitted inputs by construction. The central premise (exemplars preserve statistics under feature drift) is tested experimentally rather than assumed via self-definition or imported uniqueness results, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Central claim depends on the unverified assumption that feature-extractor outputs from synthetic prototypes match the utility of real exemplars; no independent evidence for this equivalence is supplied in the abstract.

axioms (1)
  • domain assumption The feature extractor produces embeddings that allow synthetic points to stand in for class distributions.
    Implicit in the claim that prototypical exemplars retain previous knowledge when passed through the extractor.
invented entities (1)
  • Prototypical exemplars no independent evidence
    purpose: Condensed synthetic representatives of prior task data
    Newly introduced mechanism for memory compression; no external validation cited.

pith-pipeline@v0.9.0 · 5477 in / 1141 out tokens · 31646 ms · 2026-05-15T11:42:07.711240+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

  1. [1]

    Prototype-sample rela- tion distillation: towards replay-free continual learning

    Nader Asadi, Mohammad Reza Davari, Sudhir Mudur, Ra- haf Aljundi, and Eugene Belilovsky. Prototype-sample rela- tion distillation: towards replay-free continual learning. In International conference on machine learning, pages 1093–

  2. [2]

    Rainbow memory: Continual learn- ing with a memory of diverse samples

    Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learn- ing with a memory of diverse samples. InProceedings of the IEEE/CVF conference on computer vision and pattern recpreviousognition, pages 8218–8227, 2021. 6, 1, 2

  3. [3]

    Online continual learn- ing on a contaminated data stream with blurry task bound- aries

    Jihwan Bang, Hyunseo Koh, Seulki Park, Hwanjun Song, Jung-Woo Ha, and Jonghyun Choi. Online continual learn- ing on a contaminated data stream with blurry task bound- aries. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9275–9284,

  4. [4]

    Make continual learning stronger via c-flat.Advances in Neural Information Processing Systems, 37:7608–7630, 2024

    Ang Bian, Wei Li, Hangjie Yuan, Mang Wang, Zixiang Zhao, Aojun Lu, Pengliang Ji, Tao Feng, et al. Make continual learning stronger via c-flat.Advances in Neural Information Processing Systems, 37:7608–7630, 2024. 6

  5. [5]

    Coresets via bilevel optimization for continual learning and stream- ing.Advances in neural information processing systems, 33: 14879–14890, 2020

    Zal ´an Borsos, Mojmir Mutny, and Andreas Krause. Coresets via bilevel optimization for continual learning and stream- ing.Advances in neural information processing systems, 33: 14879–14890, 2020. 2

  6. [6]

    Class-incremental continual learning into the extended der-verse.IEEE transactions on pattern analysis and machine intelligence, 45(5):5497–5512,

    Matteo Boschini, Lorenzo Bonicelli, Pietro Buzzega, Angelo Porrello, and Simone Calderara. Class-incremental continual learning into the extended der-verse.IEEE transactions on pattern analysis and machine intelligence, 45(5):5497–5512,

  7. [7]

    Dark experience for gen- eral continual learning: a strong, simple baseline.Advances in neural information processing systems, 33:15920–15930,

    Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for gen- eral continual learning: a strong, simple baseline.Advances in neural information processing systems, 33:15920–15930,

  8. [8]

    Mixed-precision quan- tization for federated learning on resource-constrained het- erogeneous devices

    Huancheng Chen and Haris Vikalo. Mixed-precision quan- tization for federated learning on resource-constrained het- erogeneous devices. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 6138–6148, 2024. 1

  9. [9]

    Jinpeng Chen, Runmin Cong, Yuxuan Luo, Horace Ip, and Sam Kwong. Saving 100x storage: Prototype re- play for reconstructing training sample distribution in class- incremental semantic segmentation.Advances in Neural In- formation Processing Systems, 36:35988–35999, 2023. 2

  10. [10]

    Influence-guided diffusion for dataset distillation

    Mingyang Chen, Jiawei Du, Bo Huang, Yi Wang, Xiaobo Zhang, and Wei Wang. Influence-guided diffusion for dataset distillation. InThe Thirteenth International Conference on Learning Representations, 2025. 1

  11. [11]

    A survey on deep neural network pruning: Taxonomy, compar- ison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey on deep neural network pruning: Taxonomy, compar- ison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1, 2

  12. [12]

    Continual pro- totype evolution: Learning online from non-stationary data streams

    Matthias De Lange and Tinne Tuytelaars. Continual pro- totype evolution: Learning online from non-stationary data streams. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8250–8259,

  13. [13]

    Unlocking the power of re- hearsal in continual learning: A theoretical perspective

    Junze Deng, Qinhang Wu, Peizhong Ju, Sen Lin, Ying- bin Liang, and Ness Shroff. Unlocking the power of re- hearsal in continual learning: A theoretical perspective. In Forty-second International Conference on Machine Learn- ing, 2025. 1

  14. [14]

    Ex- ploiting inter-sample and inter-feature relations in dataset distillation

    Wenxiao Deng, Wenbin Li, Tianyu Ding, Lei Wang, Hong- guang Zhang, Kuihua Huang, Jing Huo, and Yang Gao. Ex- ploiting inter-sample and inter-feature relations in dataset distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17057– 17066, 2024. 2

  15. [15]

    Loss of plasticity in deep continual learning.Nature, 632(8026):768–774, 2024

    Shibhansh Dohare, J Fernando Hernandez-Garcia, Qingfeng Lan, Parash Rahman, A Rupam Mahmood, and Richard S Sutton. Loss of plasticity in deep continual learning.Nature, 632(8026):768–774, 2024. 6, 8, 9

  16. [16]

    Diversity-driven synthesis: Enhancing dataset distilla- tion through directed weight adjustment.Advances in neural information processing systems, 37:119443–119465, 2024

    Jiawei Du, Juncheng Hu, Wenxin Huang, Joey Tianyi Zhou, et al. Diversity-driven synthesis: Enhancing dataset distilla- tion through directed weight adjustment.Advances in neural information processing systems, 37:119443–119465, 2024. 2

  17. [17]

    Exemplar-free continual representation learning via learn- able drift compensation

    Alex Gomez-Villa, Dipam Goswami, Kai Wang, Andrew D Bagdanov, Bartlomiej Twardowski, and Joost van de Weijer. Exemplar-free continual representation learning via learn- able drift compensation. InEuropean Conference on Com- puter Vision, pages 473–490. Springer, 2024. 6

  18. [18]

    Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning.Ad- vances in Neural Information Processing Systems, 36:6582– 6595, 2023

    Dipam Goswami, Yuyang Liu, Bartłomiej Twardowski, and Joost Van De Weijer. Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning.Ad- vances in Neural Information Processing Systems, 36:6582– 6595, 2023. 6

  19. [19]

    Resurrecting old classes with new data for exemplar- free continual learning

    Dipam Goswami, Albin Soutif-Cormerais, Yuyang Liu, Sandesh Kamath, Bart Twardowski, Joost Van De Weijer, et al. Resurrecting old classes with new data for exemplar- free continual learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28525–28534, 2024. 6

  20. [20]

    Efficient dataset distillation via minimax diffusion

    Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Hao- nan Wang, Wei Jiang, Yang You, and Yiran Chen. Efficient dataset distillation via minimax diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15793–15803, 2024. 2

  21. [21]

    Predicting the suscepti- bility of examples to catastrophic forgetting

    Guy Hacohen and Tinne Tuytelaars. Predicting the suscepti- bility of examples to catastrophic forgetting. InForty-second International Conference on Machine Learning, 2025. 1

  22. [22]

    Bilevel coreset se- lection in continual learning: A new formulation and algo- rithm.Advances in Neural Information Processing Systems, 36:51026–51049, 2023

    Jie Hao, Kaiyi Ji, and Mingrui Liu. Bilevel coreset se- lection in continual learning: A new formulation and algo- rithm.Advances in Neural Information Processing Systems, 36:51026–51049, 2023. 1, 2, 6

  23. [23]

    Exemplar-free online con- tinual learning

    Jiangpeng He and Fengqing Zhu. Exemplar-free online con- tinual learning. In2022 IEEE International Conference on Image Processing (ICIP), pages 541–545, 2022. 2

  24. [24]

    Prototype-guided memory replay for continual learning

    Stella Ho, Ming Liu, Lan Du, Longxiang Gao, and Yong Xi- ang. Prototype-guided memory replay for continual learning. IEEE transactions on neural networks and learning systems, 35(8):10973–10983, 2023. 3, 4, 2

  25. [25]

    Learning a unified classifier incrementally via rebalancing

    Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 831–839,

  26. [26]

    Task-distributionally ro- bust data-free meta-learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, and Dacheng Tao. Task-distributionally ro- bust data-free meta-learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 3, 4

  27. [27]

    Prototype- guided continual adaptation for class-incremental unsuper- vised domain adaptation

    Hongbin Lin, Yifan Zhang, Zhen Qiu, Shuaicheng Niu, Chuang Gan, Yanxia Liu, and Mingkui Tan. Prototype- guided continual adaptation for class-incremental unsuper- vised domain adaptation. InEuropean Conference on Com- puter Vision, pages 351–368. Springer, 2022. 4, 2

  28. [28]

    Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017

    David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning.Advances in neu- ral information processing systems, 30, 2017. 1

  29. [29]

    Ptq4sam: Post-training quantization for seg- ment anything

    Chengtao Lv, Hong Chen, Jinyang Guo, Yifu Ding, and Xi- anglong Liu. Ptq4sam: Post-training quantization for seg- ment anything. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 15941– 15951, 2024. 1

  30. [30]

    Curriculum dataset distillation.IEEE Transac- tions on Image Processing, 2025

    Zhiheng Ma, Anjia Cao, Funing Yang, Yihong Gong, and Xing Wei. Curriculum dataset distillation.IEEE Transac- tions on Image Processing, 2025. 2

  31. [31]

    Tinympc: Model- predictive control on resource-constrained microcontrollers

    Khai Nguyen, Sam Schoedel, Anoushka Alavilli, Brian Plancher, and Zachary Manchester. Tinympc: Model- predictive control on resource-constrained microcontrollers. In2024 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 1–7. IEEE, 2024. 1

  32. [32]

    Representation Learning with Contrastive Predictive Coding

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Repre- sentation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018. 3

  33. [33]

    A label is worth a thousand images in dataset distillation

    Tian Qin, Zhiwei Deng, and David Alvarez-Melis. A label is worth a thousand images in dataset distillation. Advances in Neural Information Processing Systems, 37: 131946–131971, 2024. 2

  34. [34]

    icarl: Incremental classifier and representation learning

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, pages 2001–2010, 2017. 6

  35. [35]

    Learning to learn without forgetting by maximizing transfer and minimizing interference

    Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Gerald Tesauro. Learning to learn without forgetting by maximizing transfer and minimizing interference. InInternational Conference on Learning Rep- resentations. 1

  36. [36]

    Gradient pro- jection memory for continual learning

    Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient pro- jection memory for continual learning. InInternational Con- ference on Learning Representations, 2021. 1

  37. [37]

    Continual learning with deep generative replay.Advances in neural information processing systems, 30, 2017

    Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep generative replay.Advances in neural information processing systems, 30, 2017. 1

  38. [38]

    Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017

    Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning.Advances in neural informa- tion processing systems, 30, 2017. 2, 5, 7

  39. [39]

    Layerwise optimization by gradient de- composition for continual learning

    Shixiang Tang, Dapeng Chen, Jinguo Zhu, Shijie Yu, and Wanli Ouyang. Layerwise optimization by gradient de- composition for continual learning. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 9634–9643, 2021. 1

  40. [40]

    Coreset selection via reducible loss in continual learn- ing

    Ruilin Tong, Yuhang Liu, Javen Qinfeng Shi, and Dong Gong. Coreset selection via reducible loss in continual learn- ing. InThe Thirteenth International Conference on Learning Representations, 2025. 1, 2, 6

  41. [41]

    Enhancing dataset distillation via non- critical region refinement

    Minh-Tuan Tran, Trung Le, Xuan-May Le, Thanh-Toan Do, and Dinh Phung. Enhancing dataset distillation via non- critical region refinement. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 10015– 10024, 2025. 1

  42. [42]

    Online curvature-aware replay: Leveraging $\mathbf{2ˆ{nd}}$ order information for online continual learning

    Edoardo Urettini and Antonio Carta. Online curvature-aware replay: Leveraging $\mathbf{2ˆ{nd}}$ order information for online continual learning. InForty-second International Conference on Machine Learning, 2025. 1

  43. [43]

    Emphasizing dis- criminative features for dataset distillation in complex sce- narios

    Kai Wang, Zekai Li, Zhi-Qi Cheng, Samir Khaki, Ahmad Sajedi, Ramakrishna Vedantam, Konstantinos N Plataniotis, Alexander Hauptmann, and Yang You. Emphasizing dis- criminative features for dataset distillation in complex sce- narios. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30451–30461, 2025. 2

  44. [44]

    Cut out and replay: A simple yet versatile strategy for multi-label online continual learning

    Xinrui Wang, Shao-Yuan Li, Jiaqiang Zhang, and Songcan Chen. Cut out and replay: A simple yet versatile strategy for multi-label online continual learning. InForty-second International Conference on Machine Learning, 2025. 1

  45. [45]

    Online prototype learning for online con- tinual learning

    Yujie Wei, Jiaxin Ye, Zhizhong Huang, Junping Zhang, and Hongming Shan. Online prototype learning for online con- tinual learning. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 18764–18774,

  46. [46]

    Open-vocabulary customization from clip via data-free knowledge distillation

    Yongxian Wei, Zixuan Hu, Li Shen, Zhenyi Wang, Chun Yuan, and Dacheng Tao. Open-vocabulary customization from clip via data-free knowledge distillation. InThe Thir- teenth International Conference on Learning Representa- tions, 2025. 3

  47. [47]

    Memory replay gans: Learning to generate new categories without forget- ting.Advances in neural information processing systems, 31, 2018

    Chenshen Wu, Luis Herranz, Joost Liu, Xialei andprevious Van De Weijer, Bogdan Raducanu, et al. Memory replay gans: Learning to generate new categories without forget- ting.Advances in neural information processing systems, 31, 2018. 1

  48. [48]

    Meta continual learning revisited: Implicitly enhancing online hessian approximation via variance reduc- tion

    Yichen Wu, Long-Kai Huang, Renzhen Wang, Deyu Meng, and Ying Wei. Meta continual learning revisited: Implicitly enhancing online hessian approximation via variance reduc- tion. InThe Twelfth international conference on learning representations, 2024. 1

  49. [49]

    An efficient dataset condensation plugin and its application to continual learning.Advances in Neural In- formation Processing Systems, 36:67625–67642, 2023

    Enneng Yang, Li Shen, Zhenyi Wang, Tongliang Liu, and Guibing Guo. An efficient dataset condensation plugin and its application to continual learning.Advances in Neural In- formation Processing Systems, 36:67625–67642, 2023. 1, 2, 3

  50. [50]

    Human-guided continual learning for personalized decision-making of autonomous driving.IEEE Transactions on Intelligent Transportation Systems, 2025

    Haohan Yang, Yanxin Zhou, Jingda Wu, Haochen Liu, Lie Yang, and Chen Lv. Human-guided continual learning for personalized decision-making of autonomous driving.IEEE Transactions on Intelligent Transportation Systems, 2025. 1

  51. [51]

    Online task-free continual learn- ing via dynamic expansionable memory distribution

    Fei Ye and Adrian G Bors. Online task-free continual learn- ing via dynamic expansionable memory distribution. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 20512–20522, 2025. 1, 2

  52. [52]

    Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective.Advances in Neural Information Process- ing Systems, 36:73582–73603, 2023

    Zeyuan Yin, Eric Xing, and Zhiqiang Shen. Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective.Advances in Neural Information Process- ing Systems, 36:73582–73603, 2023. 1

  53. [53]

    Layerwise proximal replay: A proximal point method for online continual learning

    Jinsoo Yoo, Yunpeng Liu, Frank Wood, and Geoff Pleiss. Layerwise proximal replay: A proximal point method for online continual learning. InForty-first International Con- ference on Machine Learning, 2024. 1, 2

  54. [54]

    Online coreset selection for rehearsal-based contin- ual learning

    Jaehong Yoon, Divyam Madaan, Eunho Yang, and Sung Ju Hwang. Online coreset selection for rehearsal-based contin- ual learning. InInternational Conference on Learning Rep- resentations, 2022. 1, 2, 6

  55. [55]

    Repeated aug- mented rehearsal: A simple but strong baseline for online continual learning

    Yaqian Zhang, Bernhard Pfahringer, Eibe Frank, Albert Bifet, Nick Jin Sean Lim, and Yunzhe Jia. Repeated aug- mented rehearsal: A simple but strong baseline for online continual learning. InAdvances in Neural Information Pro- cessing Systems 35: Annual Conference on Neural Informa- tion Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November...

  56. [56]

    Multi-layer rehearsal feature augmentation for class- incremental learning

    Bowen Zheng, Da-Wei Zhou, Han-Jia Ye, and De-Chuan Zhan. Multi-layer rehearsal feature augmentation for class- incremental learning. InForty-first International Conference on Machine Learning, 2024. 1

  57. [57]

    Hierarchical features matter: A deep exploration of progressive parameteriza- tion method for dataset distillation

    Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, and Shu-Tao Xia. Hierarchical features matter: A deep exploration of progressive parameteriza- tion method for dataset distillation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 30462–30471, 2025. 2

  58. [58]

    Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

    Da-Wei Zhou, Qi-Wei Wang, Zhi-Hong Qi, Han-Jia Ye, De- Chuan Zhan, and Ziwei Liu. Class-incremental learning: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. 1

  59. [59]

    Probabilistic bilevel coreset selec- tion

    Xiao Zhou, Renjie Pi, Weizhong Zhang, Yong Lin, Zonghao Chen, and Tong Zhang. Probabilistic bilevel coreset selec- tion. InInternational conference on machine learning, pages 27287–27302. PMLR, 2022. 6, 1

  60. [60]

    Ferret: An ef- ficient online continual learning framework under varying memory constraints

    Yuhao Zhou, Yuxin Tian, Jindi Lv, Mingjia Shi, Yuanxi Li, Qing Ye, Shuhao Zhang, and Jiancheng Lv. Ferret: An ef- ficient online continual learning framework under varying memory constraints. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 4850–4861,

  61. [61]

    Rethinking data distil- lation: Do not overlook calibration

    Dongyao Zhu, Bowen Lei, Jie Zhang, Yanbo Fang, Yiqun Xie, Ruqi Zhang, and Dongkuan Xu. Rethinking data distil- lation: Do not overlook calibration. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4935–4945, 2023. 1

  62. [62]

    Prototype augmentation and self-supervision for incremental learning

    Fei Zhu, Xu-Yao Zhang, Chuang Wang, Fei Yin, and Cheng- Lin Liu. Prototype augmentation and self-supervision for incremental learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5871–5880, 2021. 2, 3, 4, 5, 8 Memory-efficient Continual Learning with Prototypical Exemplar Condensation Supplementary Material...

  63. [63]

    Instead of storing actual old samples, the model is jointly trained on new data and the pseudo-samples gener- ated on-demand

    uses a generative model to synthesize realistic past data. Instead of storing actual old samples, the model is jointly trained on new data and the pseudo-samples gener- ated on-demand. MeRGAN [47] employs replay alignment to enforce consistency of the generative data sampled with the same random noise between the old and new generative models, similar to ...

  64. [64]

    Instead of storing entire data samples, we can reduce the memory footprint significantly [9]

    prototypes offer a lightweight alternative. Instead of storing entire data samples, we can reduce the memory footprint significantly [9]

  65. [65]

    Con- sequently, synthetic subsets can be generated by perturb- ing samples around the prototype, effectively producing diverse data points from a single exemplar

    Since the prototype captures the central tendency of the data within each class distribution, it can naturally facil- itate data augmentation through simple transformation techniques, such as Gaussian noise injection [62]. Con- sequently, synthetic subsets can be generated by perturb- ing samples around the prototype, effectively producing diverse data po...

  66. [66]

    By maintaining these proto- types, the model can preserve the structure of the em- bedding space

    Prototypes provide a stable reference point for previ- ously learned knowledge. By maintaining these proto- types, the model can preserve the structure of the em- bedding space. When learning a new task, the model is encouraged to keep new class embeddings close to their own prototypes while maintaining a clear distance from the prototypes of old classes....