pith. sign in

arxiv: 2605.14785 · v1 · pith:45VN4JFUnew · submitted 2026-05-14 · 💻 cs.LG · cs.CV

Understanding Imbalanced Forgetting in Rehearsal-Based Class-Incremental Learning

Pith reviewed 2026-06-30 21:41 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords imbalanced forgettingclass-incremental learningrehearsalcatastrophic forgettinggradient interferencelast-layer coefficientspredictive analysiscontinual learning
0
0 comments X

The pith

Three last-layer gradient coefficients predict the forgetting ranking of past classes after each incremental training step in rehearsal-based class-incremental learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Rehearsal-based class-incremental learning produces systematic and severe imbalance in forgetting across past classes even when replay samples are evenly allocated. The paper derives three coefficients directly from last-layer gradients during an incremental step; each coefficient quantifies a distinct source of interference on a given past class. When combined, the three coefficients reliably forecast the relative amount of forgetting each past class will experience by the end of that step. One coefficient, which measures self-induced interference on a past class, emerges as the strongest single predictor, and controlled checks suggest it is itself shaped by interference from the newly introduced class.

Core claim

From a principled analysis of gradients, three last-layer coefficients are constructed that capture different sources of interference affecting each past class during an incremental step. Together these coefficients reliably predict how the past classes will rank in terms of forgetting at the end of the step, supporting the view that last-layer gradient interactions during training drive the observed class-level forgetting outcomes, with the self-induced interference coefficient as the dominant term.

What carries the argument

Three last-layer coefficients that quantify distinct gradient-level interference sources (including self-induced interference and new-class interference) for each past class during an incremental training step.

If this is right

  • Imbalanced forgetting can be anticipated before the end of each incremental step by inspecting the three coefficients.
  • Mitigation strategies can target reduction of class-wise disparities in the identified interference sources.
  • Interventions focused on the self-induced interference coefficient are likely to have the largest effect on balancing forgetting.
  • Because the self-induced coefficient appears influenced by the new-class coefficient, managing new-class training dynamics may indirectly affect past-class self-interference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The coefficients could be used inside rehearsal-selection routines to choose samples that reduce predicted imbalances before training begins.
  • The same gradient-derived measures might serve as diagnostics in other rehearsal or regularization-based continual-learning regimes.
  • Directly modulating one coefficient while holding the others fixed would provide a stronger test of whether it causally drives forgetting.
  • Similar last-layer gradient summaries may reveal interference patterns in non-class-incremental settings such as task-incremental or domain-incremental learning.

Load-bearing premise

That the observed ability of the three coefficients to predict forgetting rankings establishes them as a mechanistic account of forgetting, even though prediction alone does not prove causation.

What would settle it

Compute the three coefficients at the start of an incremental step and then measure actual forgetting rankings after training; if the predicted order and the observed order disagree on a majority of classes across multiple runs, the predictive link fails.

Figures

Figures reproduced from arXiv: 2605.14785 by Alberto Tamajo, Rahman Attar, Srinandan Dasmahapatra.

Figure 1
Figure 1. Figure 1: Swarm plots illustrating the distribution of forgetting across past classes, as measured by FG (6), for randomly selected incremental steps from each [PITH_FULL_IMAGE:figures/full_fig_p020_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Swarm plots illustrating the distribution of forgetting across past classes, as measured by FG (6), for randomly selected incremental steps from each [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Swarm plots illustrating the distribution of forgetting across past classes, as measured by FG (6), for randomly selected incremental steps from each [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Swarm plots illustrating the distribution of forgetting across past classes, as measured by FG (6), for randomly selected incremental steps from each [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Line plots showing, for the second and third incremental steps in [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Line plots showing, for the second and third incremental steps in [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Box-and-whisker plot illustrating the distribution of class-wise associations between NIC and SIC, measured across re-runs via Spearman’s correlation [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗
read the original abstract

Neural networks suffer from catastrophic forgetting in class-incremental learning (CIL) settings. Rehearsal$\unicode{x2013}$replaying a subset of past samples$\unicode{x2013}$is a well-established mitigation strategy. However, recent results suggest that, despite balanced rehearsal allocation, some classes are forgotten substantially more than others. Despite its relevance, this imbalanced forgetting phenomenon remains underexplored. This work shows that imbalanced forgetting arises systematically and severely in rehearsal-based CIL and investigates it extensively. Specifically, we construct, from a principled analysis, three last-layer coefficients that capture different gradient-level sources of interference affecting each past class during an incremental step. We then demonstrate that, together, they reliably predict how past classes will rank in terms of forgetting at the end of that step. While predictive performance alone does not establish causality, these results support the interpretation of the coefficients as a plausible mechanistic account linking last-layer gradient-level interactions during training to class-level forgetting outcomes. Notably, one coefficient$\unicode{x2013}$capturing self-induced interference$\unicode{x2013}$emerges as the strongest predictor, with controlled experiments providing evidence consistent with this coefficient being influenced by the new-class interference coefficient. Overall, our findings provide valuable insights and suggest promising directions for mitigating imbalanced forgetting by reducing class-wise disparities in the identified sources of interference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper investigates imbalanced forgetting in rehearsal-based class-incremental learning (CIL), where some past classes are forgotten more than others despite balanced rehearsal. From a gradient-level analysis of the last layer, the authors derive three coefficients capturing distinct sources of interference (including self-induced interference) for each past class during an incremental step. They show that these coefficients, taken together, reliably predict the ranking of past classes by forgetting severity at the end of the step. Controlled experiments are presented as consistent with the self-induced interference coefficient being influenced by the new-class interference coefficient. The work frames the coefficients as a plausible mechanistic account without claiming that predictive performance establishes causality.

Significance. If the reported predictive relations hold under the stated controls, the paper supplies a concrete, gradient-derived lens on a practically relevant but underexplored failure mode of rehearsal-based CIL. The explicit separation of three interference sources and the identification of self-induced interference as the strongest predictor constitute a clear, falsifiable contribution that can guide targeted mitigation strategies aimed at reducing class-wise disparities in those sources.

minor comments (3)
  1. [Abstract / §1] The abstract and introduction would benefit from an explicit statement of the precise definition of each of the three coefficients (e.g., the functional form involving gradients or logits) before the claim that they 'reliably predict' forgetting ranks; this would allow readers to assess the 'principled analysis' claim without first consulting later sections.
  2. [Experiments section / Tables] Figure captions and experimental tables should report the exact number of random seeds, the precise rehearsal buffer size relative to new-class size, and whether the rank-prediction metric is computed within each incremental step or aggregated across steps; these details are necessary to evaluate the robustness of the reported predictive performance.
  3. [Controlled experiments subsection] The controlled experiments that test influence between coefficients would be clearer if they included an ablation that isolates the new-class interference term while holding other factors fixed; the current description leaves open whether the observed consistency could arise from correlated but non-causal factors.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary, the recognition of the contribution, and the recommendation for minor revision. No major comments appear in the provided report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives three coefficients via gradient analysis on last-layer weights and then tests their ability to rank-order forgetting on held-out incremental steps. No equation reduces a reported prediction to a fitted parameter defined on the same data; no self-citation chain supplies the central claim; the derivation is presented as an independent mechanistic hypothesis whose predictive utility is evaluated empirically rather than by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the coefficients are described as constructed from principled analysis of gradients without further detail on assumptions.

pith-pipeline@v0.9.1-grok · 5781 in / 1154 out tokens · 30292 ms · 2026-06-30T21:41:11.476339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 6 canonical work pages · 2 internal anchors

  1. [1]

    icarl: Incremental classifier and representation learning,

    S. A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” inProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January. Institute of Electrical and Electronics Engineers Inc., 2017, Conference Proceedings, pp. 5533–5542

  2. [2]

    McCloskey and N

    M. McCloskey and N. J. Cohen,Catastrophic interference in connection- ist networks: The sequential learning problem. Elsevier, 1989, vol. 24, pp. 109–165

  3. [3]

    Connectionist models of recognition memory: constraints imposed by learning and forgetting functions,

    R. Ratcliff, “Connectionist models of recognition memory: constraints imposed by learning and forgetting functions,”Psychological review, vol. 97, no. 2, p. 285, 1990

  4. [4]

    An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

    I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y . Bengio, “An empirical investigation of catastrophic forgetting in gradient-based neural networks,”arXiv preprint arXiv:1312.6211, 2013

  5. [5]

    Catastrophic forgetting, rehearsal and pseudorehearsal,

    A. Robins, “Catastrophic forgetting, rehearsal and pseudorehearsal,” Connection Science, vol. 7, no. 2, pp. 123–146, 1995

  6. [6]

    Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory,

    J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory,”Psychological review, vol. 102, no. 3, p. 419, 1995

  7. [7]

    Hippocampal and neocortical contributions to memory: Advances in the complementary learning systems framework,

    R. C. O’Reilly and K. A. Norman, “Hippocampal and neocortical contributions to memory: Advances in the complementary learning systems framework,”Trends in cognitive sciences, vol. 6, no. 12, pp. 505–510, 2002

  8. [8]

    Comple- mentary learning systems,

    R. C. O’Reilly, R. Bhattacharyya, M. D. Howard, and N. Ketz, “Comple- mentary learning systems,”Cognitive science, vol. 38, no. 6, pp. 1229– 1248, 2014

  9. [9]

    Defying imbalanced forgetting in class incremental learning,

    S. Xu, G. Meng, X. Nie, B. Ni, B. Fan, and S. Xiang, “Defying imbalanced forgetting in class incremental learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, 2024, Conference Proceedings, pp. 16 211–16 219

  10. [10]

    Gradient episodic memory for contin- ual learning,

    D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for contin- ual learning,” inAdvances in Neural Information Processing Systems, I. Guyon, R. Fergus, H. Wallach, H. Wallach, I. Guyon, S. V . N. Vishwanathan, U. von Luxburg, R. Garnett, S. V . N. Vishwanathan, S. Bengio, and R. Fergus, Eds., vol. 2017-December. Neural infor- mation processing systems...

  11. [11]

    Ef- ficient lifelong learning with a-gem,

    A. Chaudhry, R. Marc’Aurelio, M. Rohrbach, and M. Elhoseiny, “Ef- ficient lifelong learning with a-gem,” in7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR, 2019, Conference Proceedings

  12. [12]

    arXiv preprint arXiv:2007.07400 , year=

    V . V . Ramasesh, E. Dyer, and M. Raghu, “Anatomy of catastrophic forgetting: Hidden representations and task semantics,”arXiv preprint arXiv:2007.07400, 2020

  13. [13]

    Maintaining dis- crimination and fairness in class incremental learning,

    B. Zhao, X. Xiao, G. Gan, B. Zhang, and S.-T. Xia, “Maintaining dis- crimination and fairness in class incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, Conference Proceedings, pp. 13 208–13 217

  14. [14]

    Large scale incremental learning,

    Y . Wu, Y . Chen, L. Wang, Y . Ye, Z. Liu, Y . Guo, and Y . Fu, “Large scale incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, Conference Proceedings, pp. 374–382

  15. [15]

    Ss- il: Separated softmax for incremental learning,

    H. Ahn, J. Kwak, S. Lim, H. Bang, H. Kim, and T. Moon, “Ss- il: Separated softmax for incremental learning,” inProceedings of the IEEE/CVF International conference on computer vision, 2021, Confer- ence Proceedings, pp. 844–853

  16. [16]

    Scail: Classifier weights scaling for class incremental learning,

    E. Belouadah and A. Popescu, “Scail: Classifier weights scaling for class incremental learning,” inProceedings of the IEEE/CVF winter confer- ence on applications of computer vision, 2020, Conference Proceedings, pp. 1266–1275

  17. [17]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009

  18. [18]

    Theory on forgetting and generalization of continual learning,

    S. Lin, P. Ju, Y . Liang, and N. Shroff, “Theory on forgetting and generalization of continual learning,” inInternational Conference on Machine Learning. PMLR, 2023, Conference Proceedings, pp. 21 078– 21 100

  19. [19]

    Optimal task order for continual learning of multiple tasks,

    Z. Li and N. Hiratani, “Optimal task order for continual learning of multiple tasks,” inForty-second International Conference on Machine Learning, Conference Proceedings

  20. [20]

    Tiny imagenet visual recognition challenge,

    Y . Le and X. Yang, “Tiny imagenet visual recognition challenge,”CS 231N, vol. 7, no. 7, p. 3, 2015

  21. [21]

    Rehearsal revealed: The limits and merits of revisiting samples in continual learning,

    E. Verwimp, M. De Lange, and T. Tuytelaars, “Rehearsal revealed: The limits and merits of revisiting samples in continual learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, Conference Proceedings, pp. 9385–9394

  22. [22]

    Multi-layer re- hearsal feature augmentation for class-incremental learning,

    B. Zheng, D.-W. Zhou, H.-J. Ye, and D.-C. Zhan, “Multi-layer re- hearsal feature augmentation for class-incremental learning,” inForty- first International Conference on Machine Learning, 2024, Conference Proceedings

  23. [23]

    Improved sample complexities for deep neural networks and robust classification via an all-layer margin,

    C. Wei and T. Ma, “Improved sample complexities for deep neural networks and robust classification via an all-layer margin,” inInter- national Conference on Learning Representations, 2020, Conference Proceedings

  24. [24]

    New insights on reducing abrupt representation change in online continual learning,

    L. Caccia, R. Aljundi, N. Asadi, T. Tuytelaars, J. Pineau, and E. Belilovsky, “New insights on reducing abrupt representation change in online continual learning,” inICLR 2022 - 10th Conference on Learning Representations, 2022, Conference Proceedings

  25. [25]

    Continual learning by modeling intra-class variation,

    L. Yu, T. Hu, H. Lanqing, Z. Liu, A. Weller, and W. Liu, “Continual learning by modeling intra-class variation,”Transactions on Machine Learning Research, 2023

  26. [26]

    Continual learning in the teacher-student setup: Impact of task similarity,

    S. Lee, S. Goldt, and A. Saxe, “Continual learning in the teacher-student setup: Impact of task similarity,” inInternational Conference on Machine Learning. PMLR, 2021, Conference Proceedings, pp. 6109–6119

  27. [27]

    The joint effect of task similarity and overparameterization on catastrophic forgetting–an analytical model,

    D. Goldfarb, I. Evron, N. Weinberger, D. Soudry, and P. Hand, “The joint effect of task similarity and overparameterization on catastrophic forgetting–an analytical model,” inICLR 2024 - 12th International Con- ference on Learning Representations, 2024, Conference Proceedings

  28. [28]

    How catastrophic can catastrophic forgetting be in linear regression?

    I. Evron, E. Moroshko, R. Ward, N. Srebro, and D. Soudry, “How catastrophic can catastrophic forgetting be in linear regression?” inCon- ference on Learning Theory. PMLR, 2022, Conference Proceedings, pp. 4028–4079

  29. [29]

    End-to-end incremental learning,

    F. M. Castro, M. J. Mar ´ın-Jim´enez, N. Guil, C. Schmid, and K. Alahari, “End-to-end incremental learning,” inProceedings of the European conference on computer vision (ECCV), 2018, Conference Proceedings, pp. 233–248

  30. [30]

    Siesta: Efficient online continual learning with sleep,

    M. Y . Harun, J. Gallardo, T. L. Hayes, R. Kemker, and C. Kanan, “Siesta: Efficient online continual learning with sleep,”arXiv preprint arXiv:2303.10725, 2023

  31. [31]

    Gra- dient surgery for multi-task learning,

    T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gra- dient surgery for multi-task learning,”Advances in neural information processing systems, vol. 33, pp. 5824–5836, 2020

  32. [32]

    Stochastic first-and zeroth-order methods for nonconvex stochastic programming,

    S. Ghadimi and G. Lan, “Stochastic first-and zeroth-order methods for nonconvex stochastic programming,”SIAM journal on optimization, vol. 23, no. 4, pp. 2341–2368, 2013

  33. [33]

    Class-incremental learning: A survey,

    D.-W. Zhou, Q.-W. Wang, Z.-H. Qi, H.-J. Ye, D.-C. Zhan, and Z. Liu, “Class-incremental learning: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  34. [34]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, Conference Proceedings, pp. 248–255

  35. [35]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, Conference Proceedings, pp. 770–778

  36. [36]

    Gelman and J

    A. Gelman and J. Hill,Data analysis using regression and multi- level/hierarchical models. Cambridge university press, 2007

  37. [37]

    Memory-efficient incremental learning through feature adaptation,

    A. Iscen, J. Zhang, S. Lazebnik, and C. Schmid, “Memory-efficient incremental learning through feature adaptation,” inComputer Vi- sion–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16. Springer, 2020, Conference Proceedings, pp. 699–715

  38. [38]

    Memory-efficient class- incremental learning for image classification,

    H. Zhao, H. Wang, Y . Fu, F. Wu, and X. Li, “Memory-efficient class- incremental learning for image classification,”IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5966–5977, 2021

  39. [39]

    Continual learning with deep generative replay,

    H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,”Advances in neural information processing systems, vol. 30, 2017

  40. [40]

    Learning to remember: A synaptic plasticity driven framework for continual learning,

    O. Ostapenko, M. Puscas, T. Klein, P. Jahnichen, and M. Nabi, “Learning to remember: A synaptic plasticity driven framework for continual learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, Conference Proceedings, pp. 11 321–11 329

  41. [41]

    Ib-drr-incremental learning with information-back discrete representation replay,

    J. Jiang, E. Cetin, and O. Celiktutan, “Ib-drr-incremental learning with information-back discrete representation replay,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Conference Proceedings, pp. 3533–3542

  42. [42]

    Class-incremental learn- ing using diffusion model for distillation and replay,

    Q. Jodelet, X. Liu, Y . J. Phua, and T. Murata, “Class-incremental learn- ing using diffusion model for distillation and replay,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, Conference Proceedings, pp. 3425–3433

  43. [43]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

  44. [44]

    Learning without forgetting,

    Z. Li and D. Hoiem, “Learning without forgetting,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935– 2947, 2017

  45. [45]

    Learning a unified classifier incrementally via rebalancing,

    S. Hou, X. Pan, C. C. Loy, Z. Wang, and D. Lin, “Learning a unified classifier incrementally via rebalancing,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, Conference Proceedings, pp. 831–839

  46. [46]

    Dark experience for general continual learning: a strong, simple baseline,

    P. Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara, “Dark experience for general continual learning: a strong, simple baseline,” Advances in neural information processing systems, vol. 33, pp. 15 920– 15 930, 2020

  47. [47]

    Der: Dynamically expandable representation for class incremental learning,

    S. Yan, J. Xie, and X. He, “Der: Dynamically expandable representation for class incremental learning,” inProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, 2021, Conference Proceedings, pp. 3014–3023

  48. [48]

    Foster: Feature boosting and compression for class-incremental learning,

    F.-Y . Wang, D.-W. Zhou, H.-J. Ye, and D.-C. Zhan, “Foster: Feature boosting and compression for class-incremental learning,” inEuropean conference on computer vision. Springer, 2022, Conference Proceed- ings, pp. 398–414

  49. [49]

    A model or 603 exemplars: Towards memory-efficient class-incremental learning,

    D.-W. Z. Zhan, Q.-W. Wang, H.-J. Ye, and De-Chuan, “A model or 603 exemplars: Towards memory-efficient class-incremental learning,” inICLR 2023 - 11th Conference on Learning Representations, 2023, Conference Proceedings

  50. [50]

    Riemannian walk for incremental learning: Understanding forgetting and intransi- gence,

    A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. Torr, “Riemannian walk for incremental learning: Understanding forgetting and intransi- gence,” inProceedings of the European conference on computer vision (ECCV), 2018, Conference Proceedings, pp. 532–547

  51. [51]

    Rainbow memory: Continual learning with a memory of diverse samples,

    J. Bang, H. Kim, Y . Yoo, J.-W. Ha, and J. Choi, “Rainbow memory: Continual learning with a memory of diverse samples,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, Conference Proceedings, pp. 8218–8227

  52. [52]

    Gradient based sample selection for online continual learning,

    R. Aljundi, M. Lin, B. Goujaud, and Y . Bengio, “Gradient based sample selection for online continual learning,”Advances in neural information processing systems, vol. 32, 2019

  53. [53]

    Introduction to core-sets: an updated survey,

    D. Feldman, “Introduction to core-sets: an updated survey,”arXiv preprint arXiv:2011.09384, 2020

  54. [54]

    Gcr: Gradient coreset based replay buffer selection for continual learning,

    R. Tiwari, K. Killamsetty, R. Iyer, and P. Shenoy, “Gcr: Gradient coreset based replay buffer selection for continual learning,” inProceedings of 15 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, Conference Proceedings, pp. 99–108

  55. [55]

    Anempiricalstudyofexampleforgettingduring deep neural network learning

    M. Toneva, A. Sordoni, R. T. d. Combes, A. Trischler, Y . Bengio, and G. J. Gordon, “An empirical study of example forgetting during deep neural network learning,”arXiv preprint arXiv:1812.05159, 2018

  56. [56]

    Example forgetting and rehearsal in continual learning,

    B. Benk ˝o, “Example forgetting and rehearsal in continual learning,” Pattern Recognition Letters, vol. 179, pp. 65–72, 2024

  57. [57]

    Coresets via bilevel optimization for continual learning and streaming,

    Z. Borsos, M. Mutny, and A. Krause, “Coresets via bilevel optimization for continual learning and streaming,”Advances in neural information processing systems, vol. 33, pp. 14 879–14 890, 2020

  58. [58]

    Sgdr: Stochastic gradient descent with warm restarts,

    I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” in5th International Conference on Learning Repre- sentations, ICLR 2017 - Conference Track Proceedings. International Conference on Learning Representations, ICLR, 2017, Conference Pro- ceedings

  59. [59]

    Better bootstrap confidence intervals,

    B. Efron, “Better bootstrap confidence intervals,”Journal of the Amer- ican statistical Association, vol. 82, no. 397, pp. 171–185, 1987

  60. [60]

    Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties,

    J. G. MacKinnon and H. White, “Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties,” Journal of econometrics, vol. 29, no. 3, pp. 305–325, 1985. APPENDIXA REHEARSAL-BASEDAPPROACHES IN CLASS-INCREMENTALLEARNING In addition to the standard formulation of rehearsal considered in the main body of this paper (...

  61. [61]

    FOSTER [48] improves ef- ficiency by employing a model compression process based on knowledge distillation [43]

    introduces a new backbone at each incremental step and aggregates the features from all backbones, which are then passed to a shared final classifier. FOSTER [48] improves ef- ficiency by employing a model compression process based on knowledge distillation [43]. MEMO [49] further optimizes the expansion protocol by selectively expanding only specialized ...