MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning

Ankita Awasthi; Kaushik Roy; Marco Apolinario

arxiv: 2605.19080 · v1 · pith:2XHBJ3XEnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning

Ankita Awasthi , Marco Apolinario , Kaushik Roy This is my paper

Pith reviewed 2026-05-20 12:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords online continual learningmeta-learninggradient optimizationcatastrophic forgettingreplay bufferstability-plasticity trade-off

0 comments

The pith

MANGO uses gradient-gating and meta-learned regularization to balance stability and plasticity in online continual learning from data streams.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In online continual learning, models must learn from a single-pass data stream with limited replay memory while avoiding catastrophic forgetting of previous tasks. The paper introduces MANGO, which applies gradient-gating to scale updates according to parameter sensitivity and meta-learned regularization to adjust stability based on how updates affect replay samples. This setup lets the replay buffer evaluate forgetting in addition to providing training data. Experiments across multiple benchmarks demonstrate superior accuracy and positive backward transfer compared to existing approaches.

Core claim

MANGO is an OCL framework that balances stability-plasticity via gradient-gating and meta-learned regularization. Gradient-gating scales parameter updates based on sensitivity, preventing destructive updates. Meta-learned regularization adapts stability coefficients, evaluating the effect of parameter update on replay. In MANGO, replay acts as both a training signal and a forgetting evaluator. It outperforms strong baselines on standard OCL benchmarks.

What carries the argument

The combination of gradient-gating, which scales parameter updates based on sensitivity to prevent destructive changes, and meta-learned regularization, which adapts stability coefficients by evaluating the effect of updates on replay data.

Load-bearing premise

Meta-learned regularization can reliably evaluate the effect of parameter updates on replay data without its own bias or need for unreported hyperparameter tuning.

What would settle it

Running the same experiments on CLEAR-10, CIFAR-100 and Tiny-ImageNet and finding that MANGO does not outperform the baselines or fails to achieve positive backward transfer on CLEAR-10.

Figures

Figures reproduced from arXiv: 2605.19080 by Ankita Awasthi, Kaushik Roy, Marco Apolinario.

**Figure 2.** Figure 2: (Left) Input stream and replay samples compute the training loss through ResNet-18. (Right) Gradients are modulated by per-parameter sigmoid gating to suppress harmful updates. The gated gradient forms a virtual update θ ′ , evaluated on replay samples to compute Lmeta and adapt the layer-wise stability coefficients λ before the final parameter update. 3.2 Training Objective The proposed method begins with… view at source ↗

**Figure 3.** Figure 3: From left to right: The first two panels plot the dynamic, layer-wise evolution of the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

In Online Continual Learning (OCL), a neural network sequentially learns from a non-stationary data stream in a single-pass with access only to a limited memory replay buffer. This contrasts sharply with off-line continual learning where training is multiple epoch dependent on large datasets. The main challenge faced by OCL is to overcome catastrophic forgetting of past tasks (stability) while learning new ones efficiently (plasticity). Existing methods counter forgetting via replay-based rehearsal, output level distillation, fixed regularization, or meta-learning on the current data. However, these methods have limitations: rehearsal introduces a stored sample bias; distillation operates on output-distributions without modulating parameter updates; fixed-regularization penalizes parameters irrespective of sensitivity; stream-only meta-learning lacks a feedback controlled parameter update. We propose Meta-Adaptive Network Gradient Optimization (MANGO), an OCL framework that balances stability-plasticity via gradient-gating and meta-learned regularization. Gradient-gating scales parameter updates based on sensitivity, preventing destructive updates. Meta-learned regularization adapts stability coefficients, evaluating the effect of parameter update on replay. In MANGO, replay acts as both a training signal and a forgetting evaluator. We evaluated our method on three standard OCL benchmark datasets. MANGO outperforms strong baselines, achieving state-of-the-art results with consistent performance across replay sizes. In domain incremental learning on CLEAR-10 and class incremental learning on CIFAR-100 and Tiny-ImageNet, it achieves highest accuracy among all baselines and achieves positive Backward Transfer, overcoming forgetting on CLEAR-10.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MANGO pairs gradient gating with meta-regularization on replay to manage stability-plasticity in online continual learning and reports SOTA accuracy plus positive backward transfer, but the dual use of the same buffer for training and evaluation leaves the stability claims open to circularity concerns.

read the letter

The main takeaway is that MANGO uses gradient gating to scale updates by parameter sensitivity and a meta-learner to adjust regularization by checking how those updates affect replay samples. It claims top accuracy on CLEAR-10 domain-incremental learning and on class-incremental CIFAR-100 and Tiny-ImageNet, with positive backward transfer that reduces forgetting on CLEAR-10 and steady results across replay sizes.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Meta-Adaptive Network Gradient Optimization (MANGO) for online continual learning (OCL). It combines gradient-gating, which scales parameter updates according to sensitivity to prevent destructive changes, with meta-learned regularization that adapts stability coefficients by evaluating the effect of updates on replay data. The authors claim that replay serves as both training signal and forgetting evaluator, enabling MANGO to achieve state-of-the-art accuracy on domain-incremental learning (CLEAR-10) and class-incremental learning (CIFAR-100, Tiny-ImageNet), with consistent performance across replay sizes and positive backward transfer on CLEAR-10 that overcomes forgetting.

Significance. If the empirical claims hold under independent validation, MANGO would offer a practical advance in OCL by providing an adaptive, feedback-controlled mechanism for the stability-plasticity trade-off that avoids the biases of fixed regularization or output-only distillation. The explicit use of meta-learning to modulate gradient updates based on replay evaluation is a technically interesting direction that could generalize to other streaming settings.

major comments (2)

[Abstract and §3] Abstract and §3 (Method): The central stability claim rests on meta-learned regularization that 'evaluates the effect of parameter update on replay' while 'replay acts as both a training signal and a forgetting evaluator.' Because the identical replay buffer supplies both the rehearsal gradients and the meta-evaluation signal, the evaluator is not independent of the update being judged. This shared usage risks the meta-learner implicitly minimizing its own reported forgetting metric rather than measuring true stability; a concrete diagnostic (e.g., meta-evaluation on a held-out subset of past data never seen during the current update) is needed to secure the positive Backward Transfer result.
[§4] §4 (Experiments): The abstract asserts 'highest accuracy among all baselines' and 'positive Backward Transfer' on CLEAR-10, yet supplies no information on the number of independent runs, statistical significance tests, variance across seeds, or the precise loss formulation and hyper-parameters of the meta-regularizer. Without these controls it is impossible to determine whether the reported SOTA margins are robust or sensitive to implementation details.

minor comments (2)

[§3] Clarify the exact mathematical definition of the meta-regularizer (e.g., how the stability coefficient is computed from the replay evaluation) and ensure all symbols are introduced before first use.
[§4] Add a short ablation isolating the contribution of gradient-gating versus the meta-regularizer to the overall performance gain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method): The central stability claim rests on meta-learned regularization that 'evaluates the effect of parameter update on replay' while 'replay acts as both a training signal and a forgetting evaluator.' Because the identical replay buffer supplies both the rehearsal gradients and the meta-evaluation signal, the evaluator is not independent of the update being judged. This shared usage risks the meta-learner implicitly minimizing its own reported forgetting metric rather than measuring true stability; a concrete diagnostic (e.g., meta-evaluation on a held-out subset of past data never seen during the current update) is needed to secure the positive Backward Transfer result.

Authors: We appreciate the referee's observation on the shared use of the replay buffer. In MANGO the meta-regularizer is explicitly designed to evaluate the effect of a candidate update on replay performance in order to adapt the stability coefficient for that step, creating an online feedback loop. This is intentional and enables the method to operate without extra memory. Nevertheless, to rule out any risk of self-minimization and to further substantiate the reported positive backward transfer, we will add a diagnostic experiment in the revised manuscript that reserves a held-out subset of replay samples exclusively for meta-evaluation and never uses them for the current rehearsal gradients or update. revision: yes
Referee: [§4] §4 (Experiments): The abstract asserts 'highest accuracy among all baselines' and 'positive Backward Transfer' on CLEAR-10, yet supplies no information on the number of independent runs, statistical significance tests, variance across seeds, or the precise loss formulation and hyper-parameters of the meta-regularizer. Without these controls it is impossible to determine whether the reported SOTA margins are robust or sensitive to implementation details.

Authors: We agree that these experimental details are necessary for assessing robustness. In the revised manuscript we will report results over multiple independent runs with different random seeds, include mean and standard deviation, perform and report statistical significance tests against baselines, and provide the exact loss formulation together with all hyper-parameters of the meta-regularizer in the main text or a dedicated appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical OCL method (MANGO) that uses replay buffer for both rehearsal gradients and meta-regularization to adapt stability coefficients. This is a stated design choice rather than a mathematical derivation or prediction that reduces to its inputs by construction. No equations, self-citations, or uniqueness theorems are quoted that would force the central claims (positive backward transfer, SOTA accuracy) to be tautological. Performance results are presented as experimental outcomes on CLEAR-10, CIFAR-100 and Tiny-ImageNet, which remain externally falsifiable. The shared use of replay data is explicit but does not constitute circularity under the defined patterns because it is not a fitted parameter renamed as a prediction or a self-referential definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities that can be extracted with certainty.

pith-pipeline@v0.9.0 · 5812 in / 1048 out tokens · 52057 ms · 2026-05-20T12:30:18.136512+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

In MANGO, replay acts as both a training signal and a forgetting evaluator... Meta-learned regularization adapts stability coefficients, evaluating the effect of parameter update on replay.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Gradient-gating scales parameter updates based on sensitivity... L_train = L_CE + Σ λ_i/2 ||θ_i − θ_old_i||²

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 6 internal anchors

[1]

Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Ku- mar Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. Continual learning with tiny episodic memories.CoRR, abs/1902.10486, 2019. URLhttp://arxiv.org/abs/1902.10486

work page internal anchor Pith review Pith/arXiv arXiv 1902
[2]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc. ISBN 9781713829546

work page 2020
[3]

New insights on reducing abrupt representation change in online continual learning

Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, and Eugene Belilovsky. New insights on reducing abrupt representation change in online continual learning. 9 InInternational Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=N8MaByOzUfb

work page 2022
[4]

Loss decoupling for task-agnostic continual learning

Yan-Shuo Liang and Wu-Jun Li. Loss decoupling for task-agnostic continual learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https: //openreview.net/forum?id=9Oi3YxIBSa

work page 2023
[5]

Rabinowitz, Joel Veness, Guillaume Desjardins, An- drei A

James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Des- jardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Over- coming catastrophic forgetting in neural networks.CoRR, abs/1612.00796, 2016. URL http://arxiv.org/abs/1...

work page arXiv 2016
[6]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adapta- tion of deep networks.CoRR, abs/1703.03400, 2017. URL http://arxiv.org/abs/1703. 03400

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Amphibian: A meta-learning framework for rehearsal-free, fast online continual learning.Transactions on Machine Learning Research, 2025

Gobinda Saha and Kaushik Roy. Amphibian: A meta-learning framework for rehearsal-free, fast online continual learning.Transactions on Machine Learning Research, 2025. ISSN 2835-8856. URLhttps://openreview.net/forum?id=n4AaKOBWbB

work page 2025
[8]

Experi- ence replay for continual learning

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experi- ence replay for continual learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, vol- ume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_...

work page 2019
[9]

Efficient lifelong learning with a-GEM

Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with a-GEM. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Hkf2_sC5FX

work page 2019
[10]

Ameya Prabhu, Philip H. S. Torr, and Puneet K. Dokania. Gdumb: A simple approach that ques- tions our progress in continual learning. InComputer Vision – ECCV 2020: 16th European Con- ference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, page 524–540, Berlin, Heidel- berg, 2020. Springer-Verlag. ISBN 978-3-030-58535-8. doi: 10.1007/978-3-030-585...

work page doi:10.1007/978-3-030-58536-5_31 2020
[11]

Gradient projection memory for continual learning

Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient projection memory for continual learning. InInternational Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=3AOj0RCNC2

work page 2021
[12]

Continual learning with scaled gradient projection, 2023

Gobinda Saha and Kaushik Roy. Continual learning with scaled gradient projection, 2023. URL https://arxiv.org/abs/2302.01386

work page arXiv 2023
[13]

Orthogonal gradient descent for continual learning.CoRR, abs/1910.07104, 2019

Mehrdad Farajtabar, Navid Azizan, Alex Mott, and Ang Li. Orthogonal gradient descent for continual learning.CoRR, abs/1910.07104, 2019. URL http://arxiv.org/abs/1910. 07104

work page arXiv 1910
[14]

Memory Aware Synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuyte- laars. Memory aware synapses: Learning what (not) to forget.CoRR, abs/1711.09601, 2017. URLhttp://arxiv.org/abs/1711.09601

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Learning a unified classifier incrementally via rebalancing

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

work page 2019
[16]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, and Christoph H. Lampert. icarl: Incremental classifier and representation learning.CoRR, abs/1611.07725, 2016. URL http://arxiv. org/abs/1611.07725

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

Small- task incremental learning.CoRR, abs/2004.13513, 2020

Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Small- task incremental learning.CoRR, abs/2004.13513, 2020. URL https://arxiv.org/abs/ 2004.13513. 10

work page arXiv 2004
[18]

Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Ger- ald Tesauro. Learning to learn without forgetting by maximizing transfer and minimizing interference.CoRR, abs/1810.11910, 2018. URLhttp://arxiv.org/abs/1810.11910

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Meta-learning representations for continual learning.CoRR, abs/1905.12588, 2019

Khurram Javed and Martha White. Meta-learning representations for continual learning.CoRR, abs/1905.12588, 2019. URLhttp://arxiv.org/abs/1905.12588

work page arXiv 1905
[20]

Jeffrey S. Vitter. Random sampling with a reservoir.ACM Trans. Math. Softw., 11(1):37–57, March 1985. ISSN 0098-3500. doi: 10.1145/3147.3165. URL https://doi.org/10.1145/ 3147.3165

work page doi:10.1145/3147.3165 1985
[21]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009. URL https://www.cs. toronto.edu/~kriz/learning-features-2009-TR.pdf

work page 2009
[22]

Le and X

Y . Le and X. Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

work page 2015
[23]

The CLEAR benchmark: Continual LEArning on real-world imagery

Zhiqiu Lin, Jia Shi, Deepak Pathak, and Deva Ramanan. The CLEAR benchmark: Continual LEArning on real-world imagery. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. URL https://openreview.net/ forum?id=43mYF598ZDB

work page 2021
[24]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.CoRR, abs/1512.03385, 2015. URLhttp://arxiv.org/abs/1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2015
[25]

Online fast adaptation and knowledge accumulation: a new approach to continual learning

Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, and Laurent Charlin. Online fast adaptation and knowledge accumulation: a new approach to continual learning. NeurIPS, 2020. URLhttps://arxiv.org/abs/2003.05856

work page arXiv 2020
[26]

Limitations

Matthias De Lange, Gido M van de Ven, and Tinne Tuytelaars. Continual evaluation for lifelong learning: Identifying the stability gap. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=Zy350cRstc6. Appendix We are providing supplementary material and additional experimentation information in this...

work page 2023
[27]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Ku- mar Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. Continual learning with tiny episodic memories.CoRR, abs/1902.10486, 2019. URLhttp://arxiv.org/abs/1902.10486

work page internal anchor Pith review Pith/arXiv arXiv 1902

[2] [2]

Dark experience for general continual learning: a strong, simple baseline

Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc. ISBN 9781713829546

work page 2020

[3] [3]

New insights on reducing abrupt representation change in online continual learning

Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, and Eugene Belilovsky. New insights on reducing abrupt representation change in online continual learning. 9 InInternational Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=N8MaByOzUfb

work page 2022

[4] [4]

Loss decoupling for task-agnostic continual learning

Yan-Shuo Liang and Wu-Jun Li. Loss decoupling for task-agnostic continual learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https: //openreview.net/forum?id=9Oi3YxIBSa

work page 2023

[5] [5]

Rabinowitz, Joel Veness, Guillaume Desjardins, An- drei A

James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Des- jardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Over- coming catastrophic forgetting in neural networks.CoRR, abs/1612.00796, 2016. URL http://arxiv.org/abs/1...

work page arXiv 2016

[6] [6]

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adapta- tion of deep networks.CoRR, abs/1703.03400, 2017. URL http://arxiv.org/abs/1703. 03400

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Amphibian: A meta-learning framework for rehearsal-free, fast online continual learning.Transactions on Machine Learning Research, 2025

Gobinda Saha and Kaushik Roy. Amphibian: A meta-learning framework for rehearsal-free, fast online continual learning.Transactions on Machine Learning Research, 2025. ISSN 2835-8856. URLhttps://openreview.net/forum?id=n4AaKOBWbB

work page 2025

[8] [8]

Experi- ence replay for continual learning

David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experi- ence replay for continual learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, vol- ume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_...

work page 2019

[9] [9]

Efficient lifelong learning with a-GEM

Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with a-GEM. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Hkf2_sC5FX

work page 2019

[10] [10]

Ameya Prabhu, Philip H. S. Torr, and Puneet K. Dokania. Gdumb: A simple approach that ques- tions our progress in continual learning. InComputer Vision – ECCV 2020: 16th European Con- ference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, page 524–540, Berlin, Heidel- berg, 2020. Springer-Verlag. ISBN 978-3-030-58535-8. doi: 10.1007/978-3-030-585...

work page doi:10.1007/978-3-030-58536-5_31 2020

[11] [11]

Gradient projection memory for continual learning

Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient projection memory for continual learning. InInternational Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=3AOj0RCNC2

work page 2021

[12] [12]

Continual learning with scaled gradient projection, 2023

Gobinda Saha and Kaushik Roy. Continual learning with scaled gradient projection, 2023. URL https://arxiv.org/abs/2302.01386

work page arXiv 2023

[13] [13]

Orthogonal gradient descent for continual learning.CoRR, abs/1910.07104, 2019

Mehrdad Farajtabar, Navid Azizan, Alex Mott, and Ang Li. Orthogonal gradient descent for continual learning.CoRR, abs/1910.07104, 2019. URL http://arxiv.org/abs/1910. 07104

work page arXiv 1910

[14] [14]

Memory Aware Synapses: Learning what (not) to forget

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuyte- laars. Memory aware synapses: Learning what (not) to forget.CoRR, abs/1711.09601, 2017. URLhttp://arxiv.org/abs/1711.09601

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Learning a unified classifier incrementally via rebalancing

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

work page 2019

[16] [16]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, and Christoph H. Lampert. icarl: Incremental classifier and representation learning.CoRR, abs/1611.07725, 2016. URL http://arxiv. org/abs/1611.07725

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

Small- task incremental learning.CoRR, abs/2004.13513, 2020

Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Small- task incremental learning.CoRR, abs/2004.13513, 2020. URL https://arxiv.org/abs/ 2004.13513. 10

work page arXiv 2004

[18] [18]

Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Ger- ald Tesauro. Learning to learn without forgetting by maximizing transfer and minimizing interference.CoRR, abs/1810.11910, 2018. URLhttp://arxiv.org/abs/1810.11910

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Meta-learning representations for continual learning.CoRR, abs/1905.12588, 2019

Khurram Javed and Martha White. Meta-learning representations for continual learning.CoRR, abs/1905.12588, 2019. URLhttp://arxiv.org/abs/1905.12588

work page arXiv 1905

[20] [20]

Jeffrey S. Vitter. Random sampling with a reservoir.ACM Trans. Math. Softw., 11(1):37–57, March 1985. ISSN 0098-3500. doi: 10.1145/3147.3165. URL https://doi.org/10.1145/ 3147.3165

work page doi:10.1145/3147.3165 1985

[21] [21]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009. URL https://www.cs. toronto.edu/~kriz/learning-features-2009-TR.pdf

work page 2009

[22] [22]

Le and X

Y . Le and X. Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

work page 2015

[23] [23]

The CLEAR benchmark: Continual LEArning on real-world imagery

Zhiqiu Lin, Jia Shi, Deepak Pathak, and Deva Ramanan. The CLEAR benchmark: Continual LEArning on real-world imagery. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. URL https://openreview.net/ forum?id=43mYF598ZDB

work page 2021

[24] [24]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.CoRR, abs/1512.03385, 2015. URLhttp://arxiv.org/abs/1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2015

[25] [25]

Online fast adaptation and knowledge accumulation: a new approach to continual learning

Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, and Laurent Charlin. Online fast adaptation and knowledge accumulation: a new approach to continual learning. NeurIPS, 2020. URLhttps://arxiv.org/abs/2003.05856

work page arXiv 2020

[26] [26]

Limitations

Matthias De Lange, Gido M van de Ven, and Tinne Tuytelaars. Continual evaluation for lifelong learning: Identifying the stability gap. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=Zy350cRstc6. Appendix We are providing supplementary material and additional experimentation information in this...

work page 2023

[27] [27]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page