Efficient Lifelong Learning with A-GEM

Arslan Chaudhry , Marc'Aurelio Ranzato , Marcus Rohrbach , Mohamed Elhoseiny

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords learninga-gemlifelongtasksefficiencyefficientevaluationeven

read the original abstract

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
cs.AI 2023-06 conditional novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning
cs.CV 2026-05 unverdicted novelty 7.0

SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
cs.LG 2026-05 accept novelty 7.0

DRIFT is a benchmark for task-free continual graph learning under continuous distribution shifts, demonstrating that standard methods degrade without task boundary information.
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound
cs.LG 2026-05 unverdicted novelty 7.0

MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sk...
Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.
SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators
cs.LG 2026-03 unverdicted novelty 7.0

SLE-FNO achieves zero forgetting and strong plasticity-stability balance in continual learning for FNO surrogate models of pulsatile blood flow by adding minimal single-layer extensions across four out-of-distribution tasks.
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
cs.LG 2026-05 unverdicted novelty 6.0

DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
cs.LG 2026-05 unverdicted novelty 6.0

CRAFT is a continual learning method for LLMs that applies low-rank interventions on hidden states, unified by KL divergence for routing similar tasks, regularizing against forgetting, and merging updates, showing red...
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
cs.LG 2026-05 unverdicted novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
Tracking Adaptation Time: Metrics for Temporal Distribution Shift
cs.LG 2026-04 unverdicted novelty 6.0

Three complementary metrics are introduced to distinguish model adaptation from intrinsic data difficulty under temporal distribution shift.
Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning
cs.LG 2026-05 unverdicted novelty 5.0

Muon-OGD integrates Muon-style spectral-norm geometry with orthogonal gradient constraints to improve the stability-plasticity trade-off during sequential LLM adaptation.
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
cs.AI 2026-05 unverdicted novelty 5.0

HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigati...
CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning
cs.LG 2026-05 unverdicted novelty 5.0

CRAFT is a continual learning method for LLMs that learns low-rank interventions on hidden representations, using a unified KL-divergence objective to handle task routing by output divergence, forgetting control via p...