MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

Chi-Nguyen Tran; Dao Sy Duy Minh; Huynh Trung Kiet; Long Tran-Thanh; Nguyen Lam Phu Quy; Phu-Hoa Pham

arxiv: 2605.11617 · v2 · pith:6U3QYYO4new · submitted 2026-05-12 · 💻 cs.LG · math.ST· stat.TH

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

Phu-Hoa Pham , Chi-Nguyen Tran , Nguyen Lam Phu Quy , Dao Sy Duy Minh , Huynh Trung Kiet , Long Tran-Thanh This is my paper

Pith reviewed 2026-05-20 21:42 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.TH

keywords streaming decision treesclass-incremental learningMcDiarmid boundcontinual learningGini splittingquantile sketchesonline learning

0 comments

The pith

Streaming decision trees can handle online class-incremental learning reliably by using a McDiarmid bound that keeps split confidence independent of class count.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Streaming decision trees are attractive for continual learning because they update locally and use bounded memory, but they fail when new classes arrive because their split criteria lose reliability as the class count K increases. This happens because the range of information gain scales with log of K, forcing any derived bounds to widen. MIST overcomes this with a McDiarmid-derived radius for Gini splits that stays tight and K-independent, acting to regularize the tree structure. It further transfers knowledge from parent to child nodes through a Bayesian protocol using truncated Gaussian moments and uses KLL sketches at leaves to support flexible splitting and geometry-aware predictions. Experiments show it matches parametric methods on Gaussian data and outperforms on non-Gaussian cases where others fail.

Core claim

MIST resolves both failures through three integrated components: (i) a tight, K-independent McDiarmid confidence radius for Gini splitting that acts as a structural regulariser; (ii) a Bayesian inheritance protocol that projects parent statistics to child nodes via truncated-Gaussian moments, with variance reduction guarantees strongest precisely when splitting is most conservative; and (iii) per-leaf KLL quantile sketches that support both continuous threshold evaluation and geometry-adaptive leaf prediction from a single data structure.

What carries the argument

K-independent McDiarmid confidence radius for Gini splitting used as a structural regulariser, together with Bayesian inheritance via truncated-Gaussian moments and per-leaf KLL quantile sketches.

If this is right

Streaming decision trees gain the ability to maintain reliable splits even as new classes are introduced over time.
The approach achieves competitive performance with global parametric methods on near-Gaussian benchmarks.
MIST shows robustness on non-Gaussian data geometries where other state-of-the-art methods collapse.
The Bayesian protocol provides the strongest variance reduction for the most conservative splits.
Leaf predictions can adapt to local data geometry using the same quantile structure used for thresholds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This mechanism could be adapted to other streaming models that suffer from output space growth.
Future work might explore applying similar McDiarmid regularisation to different impurity measures.
Testing the method on high-dimensional or image data streams would check if the robustness extends beyond tabular cases.

Load-bearing premise

The range of information gain scales with log base 2 of the class count, so bounds based on it cannot stay independent of K.

What would settle it

An experiment on a class-incremental stream showing that MIST's tree growth and accuracy remain stable as the number of classes increases, while Hoeffding-based trees degrade in split quality.

Figures

Figures reproduced from arXiv: 2605.11617 by Chi-Nguyen Tran, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh, Nguyen Lam Phu Quy, Phu-Hoa Pham.

**Figure 2.** Figure 2: Tree-dynamics diagnostics on Synth-50, Covertype, and Split-MNIST (standard, un [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

**Figure 3.** Figure 3: 2D PCA visualisations of the eight stress-test streams (50 samples per class shown). Each [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗

read the original abstract

Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as the class count K expands, and (ii) the absence of knowledge transfer at split time. Both failures share a common root: the range of Information Gain intrinsically scales with log2 K. Consequently, any Hoeffding-style confidence radius derived from it must inevitably grow with the class count, making a K-independent split criterion structurally impossible, taking away the potential benefits of applying streaming decision trees to continual learning. To fix this issue, we present MIST (McDiarmid Incremental Streaming Tree), which resolves both failures through three integrated components: (i) a tight, K-independent McDiarmid confidence radius for Gini splitting that acts as a structural regulariser; (ii) a Bayesian inheritance protocol that projects parent statistics to child nodes via truncated-Gaussian moments, with variance reduction guarantees strongest precisely when splitting is most conservative; and (iii) per-leaf KLL quantile sketches that support both continuous threshold evaluation and geometry-adaptive leaf prediction from a single data structure. On standard and stress-test tabular streams, MIST is competitive with global parametric methods on near-Gaussian benchmarks and uniquely robust on non-Gaussian geometry where SOTA benchmarks collapse.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces MIST, a streaming decision tree for online class-incremental learning. It diagnoses two failures in prior Hoeffding-based trees: split criteria become unreliable as class count K grows because Information Gain ranges scale with log K, and there is no mechanism for knowledge transfer at split time. MIST replaces the split criterion with a McDiarmid bound on Gini impurity whose bounded-difference constant is at most 2/n + O(1/n²) independent of K, adds a Bayesian inheritance step that matches the first two moments of a truncated Gaussian to project parent statistics to children (with variance reduction strongest under conservative splits), and equips each leaf with a KLL quantile sketch supporting both continuous threshold search and geometry-adaptive prediction. Experiments on standard and stress-test tabular streams show competitiveness with global parametric methods on near-Gaussian data and robustness where other streaming baselines collapse on non-Gaussian geometry.

Significance. If the K-independence of the McDiarmid radius and the stated variance-reduction property of the truncated-Gaussian inheritance hold, the work supplies a concrete, distribution-free route to reliable splitting in growing-class continual-learning settings. The replacement of range-dependent Hoeffding bounds by a Gini-specific McDiarmid construction, together with the dual-use KLL sketch, is a technically clean integration that directly targets the scaling pathology identified in the introduction. The absence of hidden K-dependent terms in the bounded-difference argument and the moment-matching step strengthens the central claim.

major comments (1)

[§3] §3 (McDiarmid radius derivation): the central claim that the radius remains K-independent rests on the bounded-difference constant for Gini being at most 2/n + O(1/n²). The manuscript sketches the argument but does not display the explicit application of McDiarmid’s inequality to the multi-class Gini index; an expanded derivation (or appendix) is needed to confirm that no implicit dependence on the support size K enters the final radius expression.

minor comments (2)

[Abstract and §4.2] The abstract and §5 refer to “truncated-Gaussian moment parameters” without stating whether these are fixed once and for all or tuned per stream; a single sentence clarifying their status would remove ambiguity.
[§6] Table captions and axis labels in the experimental section use inconsistent abbreviations for the baseline methods; harmonizing notation with the text would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and constructive feedback on our manuscript. We address the major comment below and will incorporate the requested clarification in the revised version.

read point-by-point responses

Referee: [§3] §3 (McDiarmid radius derivation): the central claim that the radius remains K-independent rests on the bounded-difference constant for Gini being at most 2/n + O(1/n²). The manuscript sketches the argument but does not display the explicit application of McDiarmid’s inequality to the multi-class Gini index; an expanded derivation (or appendix) is needed to confirm that no implicit dependence on the support size K enters the final radius expression.

Authors: We agree that an explicit, self-contained derivation would strengthen the presentation and remove any ambiguity. In the revised manuscript we will add a dedicated appendix that applies McDiarmid’s inequality directly to the multi-class Gini index. The appendix will (i) state the bounded-difference condition for a single sample label change, (ii) compute the maximum change in Gini impurity (which is bounded by 2/n + O(1/n²) because the impurity is a normalized quadratic form over the class probabilities), and (iii) show that the resulting concentration radius contains no hidden dependence on the support size K. This will confirm that the K-independence is structural rather than incidental. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper grounds its K-independent split criterion in the McDiarmid inequality applied to Gini impurity, an external concentration result whose bounded-difference constant is shown to be independent of class count K by direct calculation of the effect of a single label flip. The Bayesian inheritance protocol and per-leaf KLL sketches are introduced as new algorithmic components whose moment-matching and distribution-free properties are derived without reference to fitted parameters or target performance metrics that would create a definitional loop. No self-citations are invoked to establish uniqueness theorems or to smuggle in ansatzes; the central derivation therefore remains self-contained and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the standard McDiarmid inequality applied to the Gini criterion and on the assumption that truncated-Gaussian moments provide a valid projection of parent statistics; no new physical entities are postulated.

free parameters (1)

truncated-Gaussian moment parameters
Parameters controlling the projection from parent to child node statistics in the Bayesian inheritance protocol.

axioms (1)

standard math McDiarmid inequality can be applied directly to the Gini splitting criterion to produce a K-independent radius
Invoked to replace Hoeffding-style bounds that scale with log K.

pith-pipeline@v0.9.0 · 5822 in / 1410 out tokens · 69646 ms · 2026-05-20T21:42:13.549522+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 5.1 (Tightness of the Gini Sensitivity). … ci ≤ 4/n. Moreover, this rate is tight …
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Corollary 4.1 (Operational McDiarmid Radius). … ε = √(32 ln(2dm/δ)/n)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 5 internal anchors

[1]

Memory aware synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuyte- laars. Memory aware synapses: Learning what (not) to forget. InProceedings of the European Conference on Computer Vision (ECCV), September 2018

work page 2018
[2]

Online continual learning with maximal interfered retrieval

Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, and Lucas Page-Caccia. Online continual learning with maximal interfered retrieval. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019
[3]

Expert gate: Lifelong learning with a network of experts

Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7120–7129, 2017

work page 2017
[4]

Gradient based sample selection for online continual learning

Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. Gradient based sample selection for online continual learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019
[5]

Il2m: Class incremental learning with dual memory

Eden Belouadah and Adrian Popescu. Il2m: Class incremental learning with dual memory. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 583–592, 2019

work page 2019
[6]

A streaming parallel decision tree algorithm.J

Yael Ben-Haim and Elad Tom-Tov. A streaming parallel decision tree algorithm.J. Mach. Learn. Res., 11:849–872, March 2010

work page 2010
[7]

Learning from time-changing data with adaptive windowing

Albert Bifet and Ricard Gavaldà. Learning from time-changing data with adaptive windowing. volume 7, 04 2007

work page 2007
[8]

New ensemble methods for evolving data streams

Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. New ensemble methods for evolving data streams. InProceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, page 139–148, New York, NY , USA, 2009. Association for Computing Machinery

work page 2009
[9]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc

work page 2020
[10]

New insights on reducing abrupt representation change in online continual learning

Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, and Eugene Belilovsky. New insights on reducing abrupt representation change in online continual learning. InInternational Conference on Learning Representations (ICLR), 2022

work page 2022
[11]

Online learning of decision trees with Thompson sampling

Ayman Chaouki, Jesse Read, and Albert Bifet. Online learning of decision trees with Thompson sampling. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors,Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proceedings of Machine Learning Research, pages 2944–2952. PMLR, 02–04 May 2024

work page 2024
[12]

Dokania, Thalaiyasingam Ajanthan, and Philip H

Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, and Philip H. S. Torr. Rie- mannian walk for incremental learning: Understanding forgetting and intransigence. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors,Computer Vision – ECCV 2018, pages 556–572, Cham, 2018. Springer International Publishing

work page 2018
[13]

Efficient Lifelong Learning with A-GEM

Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with A-GEM.CoRR, abs/1812.00420, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Ku- mar Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. Continual learning with tiny episodic memories.CoRR, abs/1902.10486, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[15]

A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3366–3385, 2022

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3366–3385, 2022. 11

work page 2022
[16]

Splitting with confidence in decision trees with application to stream mining

Rocco De Rosa and Nicolò Cesa-Bianchi. Splitting with confidence in decision trees with application to stream mining. In2015 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2015

work page 2015
[17]

Mining high-speed data streams

Pedro Domingos and Geoff Hulten. Mining high-speed data streams. InProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’00, page 71–80, New York, NY , USA, 2000. Association for Computing Machinery

work page 2000
[18]

Robert M. French. Catastrophic forgetting in connectionist networks.Trends in Cognitive Sciences, 3(4):128–135, 1999

work page 1999
[19]

João Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues.Learning with Drift Detection, volume 8, pages 286–295. 09 2004

work page 2004
[20]

Accurate decision trees for mining high-speed data streams

João Gama, Ricardo Rocha, and Pedro Medas. Accurate decision trees for mining high-speed data streams. pages 523–528, 08 2003

work page 2003
[21]

Díaz-Redondo

Pablo García-Santaclara, Bruno Fernández-Castro, and Rebeca P. Díaz-Redondo. Overcom- ing catastrophic forgetting in tabular data classification: A pseudorehearsal-based approach. Engineering Applications of Artificial Intelligence, 156:110908, 2025

work page 2025
[22]

Torr, and Bernard Ghanem

Yasir Ghunaim, Adel Bibi, Kumail Alhamoud, Motasem Alfarra, Hasan Abed Al Kader Ham- moud, Ameya Prabhu, Philip H.S. Torr, and Bernard Ghanem. Real-time evaluation in online continual learning: A new hope. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11888–11897, 2023

work page 2023
[23]

Continual contrastive learning on tabular data with out of distribution

Achmad Ginanjar, Xue Li, Priyanka Singh, and Wen Hua. Continual contrastive learning on tabular data with out of distribution. InESANN 2025 proceedings, ESANN 2025, page 93–98. Ciaco - i6doc.com, 2025

work page 2025
[24]

Adaptive random forests for evolving data stream classification.Machine Learning, 106:1–27, 10 2017

Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabrício Enembreck, Bernhard Pfahringer, Geoff Holmes, and Talel Abdessalem. Adaptive random forests for evolving data stream classification.Machine Learning, 106:1–27, 10 2017

work page 2017
[25]

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Ian Goodfellow, Mehdi Mirza, Xia Da, and Aaron Courville. An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211, 12 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[26]

Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning

Dipam Goswami, Yuyang Liu, Bartłomiej Twardowski, and Joost van de Weijer. Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 6582–6595. Curran Associates, Inc., 2023

work page 2023
[27]

Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, and Christopher Kanan

Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, and Christopher Kanan. Remind your neural network to prevent catastrophic forgetting. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors,Computer Vision – ECCV 2020, pages 466–483, Cham, 2020. Springer International Publishing

work page 2020
[28]

Hayes and Christopher Kanan

Tyler L. Hayes and Christopher Kanan. Lifelong machine learning with deep streaming linear discriminant analysis. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 887–896, 2020

work page 2020
[29]

Continual learning for unsupervised anomaly detection in continuous auditing of financial accounting data, 2022

Hamed Hemati, Marco Schreyer, and Damian Borth. Continual learning for unsupervised anomaly detection in continuous auditing of financial accounting data, 2022

work page 2022
[30]

Time-uniform, nonparametric, nonasymptotic confidence sequences.The Annals of Statistics, 49, 04 2021

Steven Howard, Aaditya Ramdas, Jon McAuliffe, and Jagmohan Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences.The Annals of Statistics, 49, 04 2021

work page 2021
[31]

Re-evaluating continual learning scenarios: A categorization and case for strong baselines, 2019

Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines, 2019

work page 2019
[32]

Mining time-changing data streams

Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discov- ery and Data Mining, KDD ’01, page 97–106, New York, NY , USA, 2001. Association for Computing Machinery. 12

work page 2001
[33]

Selective experience replay for lifelong learning

David Isele and Akansel Cosgun. Selective experience replay for lifelong learning. AAAI’18/IAAI’18/EAAI’18. AAAI Press, 2018

work page 2018
[34]

Estimating continuous distributions in bayesian classifiers

George John and Pat Langley. Estimating continuous distributions in bayesian classifiers. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 1, 02 2013

work page 2013
[35]

Optimal quantile approximation in streams

Zohar Karnin, Kevin Lang, and Edo Liberty. Optimal quantile approximation in streams. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 71–78, 2016

work page 2016
[36]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catas- trophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):352...

work page 2017
[37]

Class-incremental experience replay for continual learning under concept drift

Łukasz Korycki and Bartosz Krawczyk. Class-incremental experience replay for continual learning under concept drift. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3644–3653, 2021

work page 2021
[38]

Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges.Inf

Timothée Lesort, Vincenzo Lomonaco, Andrei Stoian, Davide Maltoni, David Filliat, and Natalia Díaz-Rodríguez. Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges.Inf. Fusion, 58(C):52–68, June 2020

work page 2020
[39]

Learning without forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2018

Zhizhong Li and Derek Hoiem. Learning without forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2018

work page 2018
[40]

Gradient episodic memory for continual learning

David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6470–6479, Red Hook, NY , USA, 2017. Curran Associates Inc

work page 2017
[41]

Online continual learning in image classification: An empirical survey.Neurocomputing, 469:28–51, 2022

Zheda Mai, Ruiwen Li, Jihwan Jeong, David Quispe, Hyunwoo Kim, and Scott Sanner. Online continual learning in image classification: An empirical survey.Neurocomputing, 469:28–51, 2022

work page 2022
[42]

Piggyback: Adapting a single network to multiple tasks by learning to mask weights

Arun Mallya, Dillon Davis, and Svetlana Lazebnik. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. InComputer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV, page 72–88, Berlin, Heidelberg, 2018. Springer-Verlag

work page 2018
[43]

PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

Arun Mallya and Svetlana Lazebnik. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning . In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7765–7773, Los Alamitos, CA, USA, June 2018. IEEE Computer Society

work page 2018
[44]

Webb, and Mahsa Salehi

Chaitanya Manapragada, Geoffrey I. Webb, and Mahsa Salehi. Extremely fast decision tree. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, page 1953–1962, New York, NY , USA, 2018. Association for Computing Machinery

work page 1953
[45]

Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 ofPsychology of Learning and Motivation, pages 109–165. Academic Press, 1989

work page 1989
[46]

Distance-based image classification: Generalizing to new classes at near-zero cost.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:2624–37, 11 2013

Thomas Mensink, Jakob Verbeek, and Gabriela Csurka. Distance-based image classification: Generalizing to new classes at near-zero cost.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:2624–37, 11 2013

work page 2013
[47]

Boosted dyadic kernel discriminants

Baback Moghaddam and Gregory Shakhnarovich. Boosted dyadic kernel discriminants. In S. Becker, S. Thrun, and K. Obermayer, editors,Advances in Neural Information Processing Systems, volume 15. MIT Press, 2002

work page 2002
[48]

Parisi, Ronald Kemker, Jose L

German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review.Neural Networks, 113:54–71, 2019. 13

work page 2019
[49]

Latent replay for real-time continual learning

Lorenzo Pellegrini, Gabriele Graffieti, Vincenzo Lomonaco, and Davide Maltoni. Latent replay for real-time continual learning. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 10203–10209. IEEE Press, 2020

work page 2020
[50]

Fetril: Feature translation for exemplar-free class-incremental learning

Grégoire Petit, Adrian Popescu, Hugo Schindler, David Picard, and Bertrand Delezoide. Fetril: Feature translation for exemplar-free class-incremental learning. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3911– 3920, January 2023

work page 2023
[51]

Ameya Prabhu, Philip H. S. Torr, and Puneet K. Dokania. Gdumb: A simple approach that questions our progress in continual learning. InComputer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, page 524–540, Berlin, Heidelberg, 2020. Springer-Verlag

work page 2020
[52]

Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions.Psychological Review, 97:285–308, 04 1990

Roger Ratcliff. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions.Psychological Review, 97:285–308, 04 1990

work page 1990
[53]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

work page 2017
[54]

Lillicrap, and Greg Wayne.Experi- ence replay for continual learning

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P. Lillicrap, and Greg Wayne.Experi- ence replay for continual learning. Curran Associates Inc., Red Hook, NY , USA, 2019

work page 2019
[55]

Progressive Neural Networks

Andrei Rusu, Neil Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks.arXiv preprint arXiv:1606.04671, 06 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[56]

Decision trees for mining data streams based on the mcdiarmid’s bound.IEEE Transactions on Knowledge and Data Engineering, 25(6):1272–1279, 2013

Leszek Rutkowski, Lena Pietruczuk, Piotr Duda, and Maciej Jaworski. Decision trees for mining data streams based on the mcdiarmid’s bound.IEEE Transactions on Knowledge and Data Engineering, 25(6):1272–1279, 2013

work page 2013
[57]

Progress & compress: A scalable framework for continual learning

Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Re...

work page 2018
[58]

Prototype reminiscence and augmented asymmetric knowledge aggregation for non-exemplar class-incremental learning

Wuxuan Shi and Mang Ye. Prototype reminiscence and augmented asymmetric knowledge aggregation for non-exemplar class-incremental learning. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1772–1781, 2023

work page 2023
[59]

Continual learning with deep gener- ative replay

Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep gener- ative replay. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017
[60]

CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning

James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning . In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11909– 11919, Los Al...

work page 2023
[61]

Three types of incremental learning

Gido van de Ven, Tinne Tuytelaars, and Andreas Tolias. Three types of incremental learning. Nature Machine Intelligence, 4:1–13, 12 2022

work page 2022
[62]

Three scenarios for continual learning

Gido M. van de Ven and Andreas S. Tolias. Three scenarios for continual learning.CoRR, abs/1904.07734, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[63]

An attention-based feature memory design for energy-efficient continual learning, 2025

Yuandou Wang, Filip Gunnarsson, and Rihan Hai. An attention-based feature memory design for energy-efficient continual learning, 2025. 14

work page 2025
[64]

Dualprompt: Complementary prompting for rehearsal-free continual learning

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Dualprompt: Complementary prompting for rehearsal-free continual learning. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors,Computer Vision – ECCV 2022, pag...

work page 2022
[65]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 139–149, 2022

work page 2022
[66]

Lifelong learning with dynamically expandable networks

Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong learning with dynamically expandable networks. InInternational Conference on Learning Representations (ICLR), 2018

work page 2018
[67]

Prediction error-based classification for class-incremental learning

Michał Zaj ˛ ac, Tinne Tuytelaars, and Gido van de Ven. Prediction error-based classification for class-incremental learning. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representations, volume 2024, pages 8239–8267, 2024

work page 2024
[68]

Continual learning through synaptic intelligence

Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 3987–3995. PMLR, 06–11 Aug 2017

work page 2017
[69]

Prototype augmentation and self-supervision for incremental learning

Fei Zhu, Xu-Yao Zhang, Chuang Wang, Fei Yin, and Cheng-Lin Liu. Prototype augmentation and self-supervision for incremental learning. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5867–5876, 2021

work page 2021
[70]

Self-sustaining representation expansion for non-exemplar class-incremental learning

Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, and Zheng-Jun Zha. Self-sustaining representation expansion for non-exemplar class-incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9296–9305, June 2022

work page 2022
[71]

Gacl: exemplar-free generalized analytic continual learning

Huiping Zhuang, Yizhu Chen, Di Fang, Run He, Kai Tong, Hongxin Wei, Ziqian Zeng, and Cen Chen. Gacl: exemplar-free generalized analytic continual learning. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc

work page 2024
[72]

1− ζϕ(ζ) ˜Φ − ϕ(ζ) ˜Φ 2# {upper-truncated; Appendix I.1} 9:else 10:(σ s,c j∗ )2 ←σ c2 j∗

Huiping Zhuang, Zhenyu Weng, Hongxin Wei, Renchuzi Xie, Kar-Ann Toh, and Zhiping Lin. Acil: Analytic class-incremental learning with absolute memorization and privacy protection. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 11602–11614. Curran Associates, ...

work page arXiv 2022

[1] [1]

Memory aware synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuyte- laars. Memory aware synapses: Learning what (not) to forget. InProceedings of the European Conference on Computer Vision (ECCV), September 2018

work page 2018

[2] [2]

Online continual learning with maximal interfered retrieval

Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, and Lucas Page-Caccia. Online continual learning with maximal interfered retrieval. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019

[3] [3]

Expert gate: Lifelong learning with a network of experts

Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7120–7129, 2017

work page 2017

[4] [4]

Gradient based sample selection for online continual learning

Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. Gradient based sample selection for online continual learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019

[5] [5]

Il2m: Class incremental learning with dual memory

Eden Belouadah and Adrian Popescu. Il2m: Class incremental learning with dual memory. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 583–592, 2019

work page 2019

[6] [6]

A streaming parallel decision tree algorithm.J

Yael Ben-Haim and Elad Tom-Tov. A streaming parallel decision tree algorithm.J. Mach. Learn. Res., 11:849–872, March 2010

work page 2010

[7] [7]

Learning from time-changing data with adaptive windowing

Albert Bifet and Ricard Gavaldà. Learning from time-changing data with adaptive windowing. volume 7, 04 2007

work page 2007

[8] [8]

New ensemble methods for evolving data streams

Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. New ensemble methods for evolving data streams. InProceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, page 139–148, New York, NY , USA, 2009. Association for Computing Machinery

work page 2009

[9] [9]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc

work page 2020

[10] [10]

New insights on reducing abrupt representation change in online continual learning

Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, and Eugene Belilovsky. New insights on reducing abrupt representation change in online continual learning. InInternational Conference on Learning Representations (ICLR), 2022

work page 2022

[11] [11]

Online learning of decision trees with Thompson sampling

Ayman Chaouki, Jesse Read, and Albert Bifet. Online learning of decision trees with Thompson sampling. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors,Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proceedings of Machine Learning Research, pages 2944–2952. PMLR, 02–04 May 2024

work page 2024

[12] [12]

Dokania, Thalaiyasingam Ajanthan, and Philip H

Arslan Chaudhry, Puneet K. Dokania, Thalaiyasingam Ajanthan, and Philip H. S. Torr. Rie- mannian walk for incremental learning: Understanding forgetting and intransigence. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors,Computer Vision – ECCV 2018, pages 556–572, Cham, 2018. Springer International Publishing

work page 2018

[13] [13]

Efficient Lifelong Learning with A-GEM

Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with A-GEM.CoRR, abs/1812.00420, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Ku- mar Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. Continual learning with tiny episodic memories.CoRR, abs/1902.10486, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[15] [15]

A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3366–3385, 2022

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Aleš Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. A continual learning survey: Defying forgetting in classification tasks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3366–3385, 2022. 11

work page 2022

[16] [16]

Splitting with confidence in decision trees with application to stream mining

Rocco De Rosa and Nicolò Cesa-Bianchi. Splitting with confidence in decision trees with application to stream mining. In2015 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2015

work page 2015

[17] [17]

Mining high-speed data streams

Pedro Domingos and Geoff Hulten. Mining high-speed data streams. InProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’00, page 71–80, New York, NY , USA, 2000. Association for Computing Machinery

work page 2000

[18] [18]

Robert M. French. Catastrophic forgetting in connectionist networks.Trends in Cognitive Sciences, 3(4):128–135, 1999

work page 1999

[19] [19]

João Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues.Learning with Drift Detection, volume 8, pages 286–295. 09 2004

work page 2004

[20] [20]

Accurate decision trees for mining high-speed data streams

João Gama, Ricardo Rocha, and Pedro Medas. Accurate decision trees for mining high-speed data streams. pages 523–528, 08 2003

work page 2003

[21] [21]

Díaz-Redondo

Pablo García-Santaclara, Bruno Fernández-Castro, and Rebeca P. Díaz-Redondo. Overcom- ing catastrophic forgetting in tabular data classification: A pseudorehearsal-based approach. Engineering Applications of Artificial Intelligence, 156:110908, 2025

work page 2025

[22] [22]

Torr, and Bernard Ghanem

Yasir Ghunaim, Adel Bibi, Kumail Alhamoud, Motasem Alfarra, Hasan Abed Al Kader Ham- moud, Ameya Prabhu, Philip H.S. Torr, and Bernard Ghanem. Real-time evaluation in online continual learning: A new hope. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11888–11897, 2023

work page 2023

[23] [23]

Continual contrastive learning on tabular data with out of distribution

Achmad Ginanjar, Xue Li, Priyanka Singh, and Wen Hua. Continual contrastive learning on tabular data with out of distribution. InESANN 2025 proceedings, ESANN 2025, page 93–98. Ciaco - i6doc.com, 2025

work page 2025

[24] [24]

Adaptive random forests for evolving data stream classification.Machine Learning, 106:1–27, 10 2017

Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabrício Enembreck, Bernhard Pfahringer, Geoff Holmes, and Talel Abdessalem. Adaptive random forests for evolving data stream classification.Machine Learning, 106:1–27, 10 2017

work page 2017

[25] [25]

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Ian Goodfellow, Mehdi Mirza, Xia Da, and Aaron Courville. An empirical investigation of catastrophic forgetting in gradient-based neural networks.arXiv preprint arXiv:1312.6211, 12 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[26] [26]

Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning

Dipam Goswami, Yuyang Liu, Bartłomiej Twardowski, and Joost van de Weijer. Fecam: Exploiting the heterogeneity of class distributions in exemplar-free continual learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 6582–6595. Curran Associates, Inc., 2023

work page 2023

[27] [27]

Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, and Christopher Kanan

Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, and Christopher Kanan. Remind your neural network to prevent catastrophic forgetting. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors,Computer Vision – ECCV 2020, pages 466–483, Cham, 2020. Springer International Publishing

work page 2020

[28] [28]

Hayes and Christopher Kanan

Tyler L. Hayes and Christopher Kanan. Lifelong machine learning with deep streaming linear discriminant analysis. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 887–896, 2020

work page 2020

[29] [29]

Continual learning for unsupervised anomaly detection in continuous auditing of financial accounting data, 2022

Hamed Hemati, Marco Schreyer, and Damian Borth. Continual learning for unsupervised anomaly detection in continuous auditing of financial accounting data, 2022

work page 2022

[30] [30]

Time-uniform, nonparametric, nonasymptotic confidence sequences.The Annals of Statistics, 49, 04 2021

Steven Howard, Aaditya Ramdas, Jon McAuliffe, and Jagmohan Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences.The Annals of Statistics, 49, 04 2021

work page 2021

[31] [31]

Re-evaluating continual learning scenarios: A categorization and case for strong baselines, 2019

Yen-Chang Hsu, Yen-Cheng Liu, Anita Ramasamy, and Zsolt Kira. Re-evaluating continual learning scenarios: A categorization and case for strong baselines, 2019

work page 2019

[32] [32]

Mining time-changing data streams

Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discov- ery and Data Mining, KDD ’01, page 97–106, New York, NY , USA, 2001. Association for Computing Machinery. 12

work page 2001

[33] [33]

Selective experience replay for lifelong learning

David Isele and Akansel Cosgun. Selective experience replay for lifelong learning. AAAI’18/IAAI’18/EAAI’18. AAAI Press, 2018

work page 2018

[34] [34]

Estimating continuous distributions in bayesian classifiers

George John and Pat Langley. Estimating continuous distributions in bayesian classifiers. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 1, 02 2013

work page 2013

[35] [35]

Optimal quantile approximation in streams

Zohar Karnin, Kevin Lang, and Edo Liberty. Optimal quantile approximation in streams. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 71–78, 2016

work page 2016

[36] [36]

Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Overcoming catas- trophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13):352...

work page 2017

[37] [37]

Class-incremental experience replay for continual learning under concept drift

Łukasz Korycki and Bartosz Krawczyk. Class-incremental experience replay for continual learning under concept drift. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3644–3653, 2021

work page 2021

[38] [38]

Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges.Inf

Timothée Lesort, Vincenzo Lomonaco, Andrei Stoian, Davide Maltoni, David Filliat, and Natalia Díaz-Rodríguez. Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges.Inf. Fusion, 58(C):52–68, June 2020

work page 2020

[39] [39]

Learning without forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2018

Zhizhong Li and Derek Hoiem. Learning without forgetting.IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):2935–2947, 2018

work page 2018

[40] [40]

Gradient episodic memory for continual learning

David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6470–6479, Red Hook, NY , USA, 2017. Curran Associates Inc

work page 2017

[41] [41]

Online continual learning in image classification: An empirical survey.Neurocomputing, 469:28–51, 2022

Zheda Mai, Ruiwen Li, Jihwan Jeong, David Quispe, Hyunwoo Kim, and Scott Sanner. Online continual learning in image classification: An empirical survey.Neurocomputing, 469:28–51, 2022

work page 2022

[42] [42]

Piggyback: Adapting a single network to multiple tasks by learning to mask weights

Arun Mallya, Dillon Davis, and Svetlana Lazebnik. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. InComputer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV, page 72–88, Berlin, Heidelberg, 2018. Springer-Verlag

work page 2018

[43] [43]

PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

Arun Mallya and Svetlana Lazebnik. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning . In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7765–7773, Los Alamitos, CA, USA, June 2018. IEEE Computer Society

work page 2018

[44] [44]

Webb, and Mahsa Salehi

Chaitanya Manapragada, Geoffrey I. Webb, and Mahsa Salehi. Extremely fast decision tree. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, page 1953–1962, New York, NY , USA, 2018. Association for Computing Machinery

work page 1953

[45] [45]

Michael McCloskey and Neal J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 ofPsychology of Learning and Motivation, pages 109–165. Academic Press, 1989

work page 1989

[46] [46]

Distance-based image classification: Generalizing to new classes at near-zero cost.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:2624–37, 11 2013

Thomas Mensink, Jakob Verbeek, and Gabriela Csurka. Distance-based image classification: Generalizing to new classes at near-zero cost.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:2624–37, 11 2013

work page 2013

[47] [47]

Boosted dyadic kernel discriminants

Baback Moghaddam and Gregory Shakhnarovich. Boosted dyadic kernel discriminants. In S. Becker, S. Thrun, and K. Obermayer, editors,Advances in Neural Information Processing Systems, volume 15. MIT Press, 2002

work page 2002

[48] [48]

Parisi, Ronald Kemker, Jose L

German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. Continual lifelong learning with neural networks: A review.Neural Networks, 113:54–71, 2019. 13

work page 2019

[49] [49]

Latent replay for real-time continual learning

Lorenzo Pellegrini, Gabriele Graffieti, Vincenzo Lomonaco, and Davide Maltoni. Latent replay for real-time continual learning. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), page 10203–10209. IEEE Press, 2020

work page 2020

[50] [50]

Fetril: Feature translation for exemplar-free class-incremental learning

Grégoire Petit, Adrian Popescu, Hugo Schindler, David Picard, and Bertrand Delezoide. Fetril: Feature translation for exemplar-free class-incremental learning. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3911– 3920, January 2023

work page 2023

[51] [51]

Ameya Prabhu, Philip H. S. Torr, and Puneet K. Dokania. Gdumb: A simple approach that questions our progress in continual learning. InComputer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, page 524–540, Berlin, Heidelberg, 2020. Springer-Verlag

work page 2020

[52] [52]

Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions.Psychological Review, 97:285–308, 04 1990

Roger Ratcliff. Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions.Psychological Review, 97:285–308, 04 1990

work page 1990

[53] [53]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

work page 2017

[54] [54]

Lillicrap, and Greg Wayne.Experi- ence replay for continual learning

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P. Lillicrap, and Greg Wayne.Experi- ence replay for continual learning. Curran Associates Inc., Red Hook, NY , USA, 2019

work page 2019

[55] [55]

Progressive Neural Networks

Andrei Rusu, Neil Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks.arXiv preprint arXiv:1606.04671, 06 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[56] [56]

Decision trees for mining data streams based on the mcdiarmid’s bound.IEEE Transactions on Knowledge and Data Engineering, 25(6):1272–1279, 2013

Leszek Rutkowski, Lena Pietruczuk, Piotr Duda, and Maciej Jaworski. Decision trees for mining data streams based on the mcdiarmid’s bound.IEEE Transactions on Knowledge and Data Engineering, 25(6):1272–1279, 2013

work page 2013

[57] [57]

Progress & compress: A scalable framework for continual learning

Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. Progress & compress: A scalable framework for continual learning. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Re...

work page 2018

[58] [58]

Prototype reminiscence and augmented asymmetric knowledge aggregation for non-exemplar class-incremental learning

Wuxuan Shi and Mang Ye. Prototype reminiscence and augmented asymmetric knowledge aggregation for non-exemplar class-incremental learning. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1772–1781, 2023

work page 2023

[59] [59]

Continual learning with deep gener- ative replay

Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Continual learning with deep gener- ative replay. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017

[60] [60]

CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning

James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, and Zsolt Kira. CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning . In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11909– 11919, Los Al...

work page 2023

[61] [61]

Three types of incremental learning

Gido van de Ven, Tinne Tuytelaars, and Andreas Tolias. Three types of incremental learning. Nature Machine Intelligence, 4:1–13, 12 2022

work page 2022

[62] [62]

Three scenarios for continual learning

Gido M. van de Ven and Andreas S. Tolias. Three scenarios for continual learning.CoRR, abs/1904.07734, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[63] [63]

An attention-based feature memory design for energy-efficient continual learning, 2025

Yuandou Wang, Filip Gunnarsson, and Rihan Hai. An attention-based feature memory design for energy-efficient continual learning, 2025. 14

work page 2025

[64] [64]

Dualprompt: Complementary prompting for rehearsal-free continual learning

Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Dualprompt: Complementary prompting for rehearsal-free continual learning. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors,Computer Vision – ECCV 2022, pag...

work page 2022

[65] [65]

Learning to prompt for continual learning

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. Learning to prompt for continual learning. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 139–149, 2022

work page 2022

[66] [66]

Lifelong learning with dynamically expandable networks

Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. Lifelong learning with dynamically expandable networks. InInternational Conference on Learning Representations (ICLR), 2018

work page 2018

[67] [67]

Prediction error-based classification for class-incremental learning

Michał Zaj ˛ ac, Tinne Tuytelaars, and Gido van de Ven. Prediction error-based classification for class-incremental learning. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representations, volume 2024, pages 8239–8267, 2024

work page 2024

[68] [68]

Continual learning through synaptic intelligence

Friedemann Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In Doina Precup and Yee Whye Teh, editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 3987–3995. PMLR, 06–11 Aug 2017

work page 2017

[69] [69]

Prototype augmentation and self-supervision for incremental learning

Fei Zhu, Xu-Yao Zhang, Chuang Wang, Fei Yin, and Cheng-Lin Liu. Prototype augmentation and self-supervision for incremental learning. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5867–5876, 2021

work page 2021

[70] [70]

Self-sustaining representation expansion for non-exemplar class-incremental learning

Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, and Zheng-Jun Zha. Self-sustaining representation expansion for non-exemplar class-incremental learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9296–9305, June 2022

work page 2022

[71] [71]

Gacl: exemplar-free generalized analytic continual learning

Huiping Zhuang, Yizhu Chen, Di Fang, Run He, Kai Tong, Hongxin Wei, Ziqian Zeng, and Cen Chen. Gacl: exemplar-free generalized analytic continual learning. InProceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc

work page 2024

[72] [72]

1− ζϕ(ζ) ˜Φ − ϕ(ζ) ˜Φ 2# {upper-truncated; Appendix I.1} 9:else 10:(σ s,c j∗ )2 ←σ c2 j∗

Huiping Zhuang, Zhenyu Weng, Hongxin Wei, Renchuzi Xie, Kar-Ann Toh, and Zhiping Lin. Acil: Analytic class-incremental learning with absolute memorization and privacy protection. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Advances in Neural Information Processing Systems, volume 35, pages 11602–11614. Curran Associates, ...

work page arXiv 2022