pith. machine review for the scientific record. sign in

arxiv: 1812.00420 · v2 · submitted 2018-12-02 · 💻 cs.LG · stat.ML

Recognition: unknown

Efficient Lifelong Learning with A-GEM

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords learninga-gemlifelongtasksefficiencyefficientevaluationeven
0
0 comments X
read the original abstract

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

    cs.AI 2023-06 conditional novelty 8.0

    LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

  2. Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning

    cs.CV 2026-05 unverdicted novelty 7.0

    SPA unlocks patch-level features in CLIP for class-incremental learning via semantic-guided selection and optimal transport alignment with class descriptions, plus projectors and pseudo-feature replay to reduce forgetting.

  3. DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts

    cs.LG 2026-05 accept novelty 7.0

    DRIFT is a benchmark for task-free continual graph learning under continuous distribution shifts, demonstrating that standard methods degrade without task boundary information.

  4. MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

    cs.LG 2026-05 unverdicted novelty 7.0

    MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sk...

  5. Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.

  6. SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators

    cs.LG 2026-03 unverdicted novelty 7.0

    SLE-FNO achieves zero forgetting and strong plasticity-stability balance in continual learning for FNO surrogate models of pulsatile blood flow by adding minimal single-layer extensions across four out-of-distribution tasks.

  7. DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts

    cs.LG 2026-05 unverdicted novelty 6.0

    DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.

  8. CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    CRAFT is a continual learning method for LLMs that applies low-rank interventions on hidden states, unified by KL divergence for routing similar tasks, regularizing against forgetting, and merging updates, showing red...

  9. Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

    cs.LG 2026-05 unverdicted novelty 6.0

    Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.

  10. Tracking Adaptation Time: Metrics for Temporal Distribution Shift

    cs.LG 2026-04 unverdicted novelty 6.0

    Three complementary metrics are introduced to distinguish model adaptation from intrinsic data difficulty under temporal distribution shift.

  11. Muon-OGD: Muon-based Spectral Orthogonal Gradient Projection for LLM Continual Learning

    cs.LG 2026-05 unverdicted novelty 5.0

    Muon-OGD integrates Muon-style spectral-norm geometry with orthogonal gradient constraints to improve the stability-plasticity trade-off during sequential LLM adaptation.

  12. HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning

    cs.AI 2026-05 unverdicted novelty 5.0

    HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigati...

  13. CRAFT: Forgetting-Aware Intervention-Based Adaptation for Continual Learning

    cs.LG 2026-05 unverdicted novelty 5.0

    CRAFT is a continual learning method for LLMs that learns low-rank interventions on hidden representations, using a unified KL-divergence objective to handle task routing by output divergence, forgetting control via p...