pith. sign in

arxiv: 2605.19080 · v1 · pith:2XHBJ3XEnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

MANGO: Meta-Adaptive Network Gradient Optimization for Online Continual Learning

Pith reviewed 2026-05-20 12:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords online continual learningmeta-learninggradient optimizationcatastrophic forgettingreplay bufferstability-plasticity trade-off
0
0 comments X

The pith

MANGO uses gradient-gating and meta-learned regularization to balance stability and plasticity in online continual learning from data streams.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In online continual learning, models must learn from a single-pass data stream with limited replay memory while avoiding catastrophic forgetting of previous tasks. The paper introduces MANGO, which applies gradient-gating to scale updates according to parameter sensitivity and meta-learned regularization to adjust stability based on how updates affect replay samples. This setup lets the replay buffer evaluate forgetting in addition to providing training data. Experiments across multiple benchmarks demonstrate superior accuracy and positive backward transfer compared to existing approaches.

Core claim

MANGO is an OCL framework that balances stability-plasticity via gradient-gating and meta-learned regularization. Gradient-gating scales parameter updates based on sensitivity, preventing destructive updates. Meta-learned regularization adapts stability coefficients, evaluating the effect of parameter update on replay. In MANGO, replay acts as both a training signal and a forgetting evaluator. It outperforms strong baselines on standard OCL benchmarks.

What carries the argument

The combination of gradient-gating, which scales parameter updates based on sensitivity to prevent destructive changes, and meta-learned regularization, which adapts stability coefficients by evaluating the effect of updates on replay data.

Load-bearing premise

Meta-learned regularization can reliably evaluate the effect of parameter updates on replay data without its own bias or need for unreported hyperparameter tuning.

What would settle it

Running the same experiments on CLEAR-10, CIFAR-100 and Tiny-ImageNet and finding that MANGO does not outperform the baselines or fails to achieve positive backward transfer on CLEAR-10.

Figures

Figures reproduced from arXiv: 2605.19080 by Ankita Awasthi, Kaushik Roy, Marco Apolinario.

Figure 1
Figure 1. Figure 1: From left to right: Task-wise accuracy trajectories for MANGO evaluated on CIFAR [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (Left) Input stream and replay samples compute the training loss through ResNet-18. (Right) Gradients are modulated by per-parameter sigmoid gating to suppress harmful updates. The gated gradient forms a virtual update θ ′ , evaluated on replay samples to compute Lmeta and adapt the layer-wise stability coefficients λ before the final parameter update. 3.2 Training Objective The proposed method begins with… view at source ↗
Figure 3
Figure 3. Figure 3: From left to right: The first two panels plot the dynamic, layer-wise evolution of the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

In Online Continual Learning (OCL), a neural network sequentially learns from a non-stationary data stream in a single-pass with access only to a limited memory replay buffer. This contrasts sharply with off-line continual learning where training is multiple epoch dependent on large datasets. The main challenge faced by OCL is to overcome catastrophic forgetting of past tasks (stability) while learning new ones efficiently (plasticity). Existing methods counter forgetting via replay-based rehearsal, output level distillation, fixed regularization, or meta-learning on the current data. However, these methods have limitations: rehearsal introduces a stored sample bias; distillation operates on output-distributions without modulating parameter updates; fixed-regularization penalizes parameters irrespective of sensitivity; stream-only meta-learning lacks a feedback controlled parameter update. We propose Meta-Adaptive Network Gradient Optimization (MANGO), an OCL framework that balances stability-plasticity via gradient-gating and meta-learned regularization. Gradient-gating scales parameter updates based on sensitivity, preventing destructive updates. Meta-learned regularization adapts stability coefficients, evaluating the effect of parameter update on replay. In MANGO, replay acts as both a training signal and a forgetting evaluator. We evaluated our method on three standard OCL benchmark datasets. MANGO outperforms strong baselines, achieving state-of-the-art results with consistent performance across replay sizes. In domain incremental learning on CLEAR-10 and class incremental learning on CIFAR-100 and Tiny-ImageNet, it achieves highest accuracy among all baselines and achieves positive Backward Transfer, overcoming forgetting on CLEAR-10.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Meta-Adaptive Network Gradient Optimization (MANGO) for online continual learning (OCL). It combines gradient-gating, which scales parameter updates according to sensitivity to prevent destructive changes, with meta-learned regularization that adapts stability coefficients by evaluating the effect of updates on replay data. The authors claim that replay serves as both training signal and forgetting evaluator, enabling MANGO to achieve state-of-the-art accuracy on domain-incremental learning (CLEAR-10) and class-incremental learning (CIFAR-100, Tiny-ImageNet), with consistent performance across replay sizes and positive backward transfer on CLEAR-10 that overcomes forgetting.

Significance. If the empirical claims hold under independent validation, MANGO would offer a practical advance in OCL by providing an adaptive, feedback-controlled mechanism for the stability-plasticity trade-off that avoids the biases of fixed regularization or output-only distillation. The explicit use of meta-learning to modulate gradient updates based on replay evaluation is a technically interesting direction that could generalize to other streaming settings.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Method): The central stability claim rests on meta-learned regularization that 'evaluates the effect of parameter update on replay' while 'replay acts as both a training signal and a forgetting evaluator.' Because the identical replay buffer supplies both the rehearsal gradients and the meta-evaluation signal, the evaluator is not independent of the update being judged. This shared usage risks the meta-learner implicitly minimizing its own reported forgetting metric rather than measuring true stability; a concrete diagnostic (e.g., meta-evaluation on a held-out subset of past data never seen during the current update) is needed to secure the positive Backward Transfer result.
  2. [§4] §4 (Experiments): The abstract asserts 'highest accuracy among all baselines' and 'positive Backward Transfer' on CLEAR-10, yet supplies no information on the number of independent runs, statistical significance tests, variance across seeds, or the precise loss formulation and hyper-parameters of the meta-regularizer. Without these controls it is impossible to determine whether the reported SOTA margins are robust or sensitive to implementation details.
minor comments (2)
  1. [§3] Clarify the exact mathematical definition of the meta-regularizer (e.g., how the stability coefficient is computed from the replay evaluation) and ensure all symbols are introduced before first use.
  2. [§4] Add a short ablation isolating the contribution of gradient-gating versus the meta-regularizer to the overall performance gain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Method): The central stability claim rests on meta-learned regularization that 'evaluates the effect of parameter update on replay' while 'replay acts as both a training signal and a forgetting evaluator.' Because the identical replay buffer supplies both the rehearsal gradients and the meta-evaluation signal, the evaluator is not independent of the update being judged. This shared usage risks the meta-learner implicitly minimizing its own reported forgetting metric rather than measuring true stability; a concrete diagnostic (e.g., meta-evaluation on a held-out subset of past data never seen during the current update) is needed to secure the positive Backward Transfer result.

    Authors: We appreciate the referee's observation on the shared use of the replay buffer. In MANGO the meta-regularizer is explicitly designed to evaluate the effect of a candidate update on replay performance in order to adapt the stability coefficient for that step, creating an online feedback loop. This is intentional and enables the method to operate without extra memory. Nevertheless, to rule out any risk of self-minimization and to further substantiate the reported positive backward transfer, we will add a diagnostic experiment in the revised manuscript that reserves a held-out subset of replay samples exclusively for meta-evaluation and never uses them for the current rehearsal gradients or update. revision: yes

  2. Referee: [§4] §4 (Experiments): The abstract asserts 'highest accuracy among all baselines' and 'positive Backward Transfer' on CLEAR-10, yet supplies no information on the number of independent runs, statistical significance tests, variance across seeds, or the precise loss formulation and hyper-parameters of the meta-regularizer. Without these controls it is impossible to determine whether the reported SOTA margins are robust or sensitive to implementation details.

    Authors: We agree that these experimental details are necessary for assessing robustness. In the revised manuscript we will report results over multiple independent runs with different random seeds, include mean and standard deviation, perform and report statistical significance tests against baselines, and provide the exact loss formulation together with all hyper-parameters of the meta-regularizer in the main text or a dedicated appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical OCL method (MANGO) that uses replay buffer for both rehearsal gradients and meta-regularization to adapt stability coefficients. This is a stated design choice rather than a mathematical derivation or prediction that reduces to its inputs by construction. No equations, self-citations, or uniqueness theorems are quoted that would force the central claims (positive backward transfer, SOTA accuracy) to be tautological. Performance results are presented as experimental outcomes on CLEAR-10, CIFAR-100 and Tiny-ImageNet, which remain externally falsifiable. The shared use of replay data is explicit but does not constitute circularity under the defined patterns because it is not a fitted parameter renamed as a prediction or a self-referential definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities that can be extracted with certainty.

pith-pipeline@v0.9.0 · 5812 in / 1048 out tokens · 52057 ms · 2026-05-20T12:30:18.136512+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 6 internal anchors

  1. [1]

    Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Ku- mar Dokania, Philip H. S. Torr, and Marc’Aurelio Ranzato. Continual learning with tiny episodic memories.CoRR, abs/1902.10486, 2019. URLhttp://arxiv.org/abs/1902.10486

  2. [2]

    Dark experience for general continual learning: a strong, simple baseline

    Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA, 2020. Curran Associates Inc. ISBN 9781713829546

  3. [3]

    New insights on reducing abrupt representation change in online continual learning

    Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, and Eugene Belilovsky. New insights on reducing abrupt representation change in online continual learning. 9 InInternational Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=N8MaByOzUfb

  4. [4]

    Loss decoupling for task-agnostic continual learning

    Yan-Shuo Liang and Wu-Jun Li. Loss decoupling for task-agnostic continual learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https: //openreview.net/forum?id=9Oi3YxIBSa

  5. [5]

    Rabinowitz, Joel Veness, Guillaume Desjardins, An- drei A

    James Kirkpatrick, Razvan Pascanu, Neil C. Rabinowitz, Joel Veness, Guillaume Des- jardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska- Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. Over- coming catastrophic forgetting in neural networks.CoRR, abs/1612.00796, 2016. URL http://arxiv.org/abs/1...

  6. [6]

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adapta- tion of deep networks.CoRR, abs/1703.03400, 2017. URL http://arxiv.org/abs/1703. 03400

  7. [7]

    Amphibian: A meta-learning framework for rehearsal-free, fast online continual learning.Transactions on Machine Learning Research, 2025

    Gobinda Saha and Kaushik Roy. Amphibian: A meta-learning framework for rehearsal-free, fast online continual learning.Transactions on Machine Learning Research, 2025. ISSN 2835-8856. URLhttps://openreview.net/forum?id=n4AaKOBWbB

  8. [8]

    Experi- ence replay for continual learning

    David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experi- ence replay for continual learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, vol- ume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_...

  9. [9]

    Efficient lifelong learning with a-GEM

    Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. Efficient lifelong learning with a-GEM. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=Hkf2_sC5FX

  10. [10]

    Ameya Prabhu, Philip H. S. Torr, and Puneet K. Dokania. Gdumb: A simple approach that ques- tions our progress in continual learning. InComputer Vision – ECCV 2020: 16th European Con- ference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II, page 524–540, Berlin, Heidel- berg, 2020. Springer-Verlag. ISBN 978-3-030-58535-8. doi: 10.1007/978-3-030-585...

  11. [11]

    Gradient projection memory for continual learning

    Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient projection memory for continual learning. InInternational Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=3AOj0RCNC2

  12. [12]

    Continual learning with scaled gradient projection, 2023

    Gobinda Saha and Kaushik Roy. Continual learning with scaled gradient projection, 2023. URL https://arxiv.org/abs/2302.01386

  13. [13]

    Orthogonal gradient descent for continual learning.CoRR, abs/1910.07104, 2019

    Mehrdad Farajtabar, Navid Azizan, Alex Mott, and Ang Li. Orthogonal gradient descent for continual learning.CoRR, abs/1910.07104, 2019. URL http://arxiv.org/abs/1910. 07104

  14. [14]

    Memory Aware Synapses: Learning what (not) to forget

    Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuyte- laars. Memory aware synapses: Learning what (not) to forget.CoRR, abs/1711.09601, 2017. URLhttp://arxiv.org/abs/1711.09601

  15. [15]

    Learning a unified classifier incrementally via rebalancing

    Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. Learning a unified classifier incrementally via rebalancing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

  16. [16]

    Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, and Christoph H. Lampert. icarl: Incremental classifier and representation learning.CoRR, abs/1611.07725, 2016. URL http://arxiv. org/abs/1611.07725

  17. [17]

    Small- task incremental learning.CoRR, abs/2004.13513, 2020

    Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. Small- task incremental learning.CoRR, abs/2004.13513, 2020. URL https://arxiv.org/abs/ 2004.13513. 10

  18. [18]

    Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

    Matthew Riemer, Ignacio Cases, Robert Ajemian, Miao Liu, Irina Rish, Yuhai Tu, and Ger- ald Tesauro. Learning to learn without forgetting by maximizing transfer and minimizing interference.CoRR, abs/1810.11910, 2018. URLhttp://arxiv.org/abs/1810.11910

  19. [19]

    Meta-learning representations for continual learning.CoRR, abs/1905.12588, 2019

    Khurram Javed and Martha White. Meta-learning representations for continual learning.CoRR, abs/1905.12588, 2019. URLhttp://arxiv.org/abs/1905.12588

  20. [20]

    Jeffrey S. Vitter. Random sampling with a reservoir.ACM Trans. Math. Softw., 11(1):37–57, March 1985. ISSN 0098-3500. doi: 10.1145/3147.3165. URL https://doi.org/10.1145/ 3147.3165

  21. [21]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009. URL https://www.cs. toronto.edu/~kriz/learning-features-2009-TR.pdf

  22. [22]

    Le and X

    Y . Le and X. Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

  23. [23]

    The CLEAR benchmark: Continual LEArning on real-world imagery

    Zhiqiu Lin, Jia Shi, Deepak Pathak, and Deva Ramanan. The CLEAR benchmark: Continual LEArning on real-world imagery. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. URL https://openreview.net/ forum?id=43mYF598ZDB

  24. [24]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.CoRR, abs/1512.03385, 2015. URLhttp://arxiv.org/abs/1512.03385

  25. [25]

    Online fast adaptation and knowledge accumulation: a new approach to continual learning

    Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, and Laurent Charlin. Online fast adaptation and knowledge accumulation: a new approach to continual learning. NeurIPS, 2020. URLhttps://arxiv.org/abs/2003.05856

  26. [26]

    Limitations

    Matthias De Lange, Gido M van de Ven, and Tinne Tuytelaars. Continual evaluation for lifelong learning: Identifying the stability gap. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=Zy350cRstc6. Appendix We are providing supplementary material and additional experimentation information in this...

  27. [27]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...