On Tiny Episodic Memories in Continual Learning

Arslan Chaudhry , Marcus Rohrbach , Mohamed Elhoseiny , Thalaiyasingam Ajanthan , Puneet K. Dokania , Philip H. S. Torr , Marc'Aurelio Ranzato

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords tasksmemoryepisodicexampleslearningtrainingcontinualexample

0 comments

read the original abstract

In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen in the past is to store a small memory, dubbed episodic memory, that stores few examples from previous tasks and then to replay these examples when training for future tasks. In this work, we empirically analyze the effectiveness of a very small episodic memory in a CL setup where each training example is only seen once. Surprisingly, across four rather different supervised learning benchmarks adapted to CL, a very simple baseline, that jointly trains on both examples from the current task as well as examples stored in the episodic memory, significantly outperforms specifically designed CL approaches with and without episodic memory. Interestingly, we find that repetitive training on even tiny memories of past tasks does not harm generalization, on the contrary, it improves it, with gains between 7\% and 17\% when the memory is populated with a single example per class.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
cs.AI 2023-06 conditional novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
cs.LG 2026-05 accept novelty 7.0

DRIFT is a benchmark for task-free continual graph learning under continuous distribution shifts, demonstrating that standard methods degrade without task boundary information.
KAN-CL: Per-Knot Importance Regularization for Continual Learning with Kolmogorov-Arnold Networks
cs.LG 2026-05 conditional novelty 7.0

KAN-CL cuts catastrophic forgetting by 88-93% on Split-CIFAR-10/5T and Split-CIFAR-100/10T by anchoring KAN parameters at per-knot granularity while matching baseline accuracy.
Online Continual Learning with Dynamic Label Hierarchies
cs.LG 2026-05 unverdicted novelty 7.0

HALO improves online continual learning under evolving label hierarchies by adaptively combining classification heads regularized with organized learnable prototypes for better adaptation and reduced forgetting.
MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound
cs.LG 2026-05 unverdicted novelty 7.0

MIST fixes unreliable splits in streaming decision trees for class-incremental learning by using a K-independent McDiarmid bound on Gini impurity, Bayesian moment projection for knowledge transfer, and KLL quantile sk...
Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay
q-bio.TO 2026-04 conditional novelty 7.0

A structure-aware VAE generates realistic FC matrices for replay, combined with multi-level knowledge distillation and hierarchical contextual bandit sampling, to enable continual fMRI-based brain disorder diagnosis a...
Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection
cs.CV 2026-04 unverdicted novelty 7.0

A replay method for continual face forgery detection condenses real-fake distribution discrepancies into compact maps and synthesizes compatible samples from current real faces to reduce forgetting under tight memory ...
SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators
cs.LG 2026-03 unverdicted novelty 7.0

SLE-FNO achieves zero forgetting and strong plasticity-stability balance in continual learning for FNO surrogate models of pulsatile blood flow by adding minimal single-layer extensions across four out-of-distribution tasks.
Continual Fine-Tuning of Large Language Models via Program Memory
cs.LG 2026-05 unverdicted novelty 6.0

ProCL organizes LoRA adapters into input-conditioned program memory slots that combine with a distributed adapter to improve retention and reduce forgetting in continual LLM fine-tuning.
DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts
cs.LG 2026-05 unverdicted novelty 6.0

DRIFT benchmark shows substantial performance degradation for continual graph learning methods under task-free continuous distribution shifts modeled via Gaussian mixtures.
Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the Edge
cs.LG 2026-04 unverdicted novelty 6.0

CPS-Prompt delivers 1.6x gains in peak memory, training time, and energy on edge hardware for continual learning while staying within 2% accuracy of top prompt-based baselines.
HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
cs.AI 2026-05 unverdicted novelty 5.0

HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigati...
CoMemNet: Contrastive Sampling with Memory Replay Network for Continual Traffic Prediction
cs.LG 2026-05 unverdicted novelty 5.0

CoMemNet is a dual-branch continual learning model for dynamic traffic networks that combines contrastive sampling via Wasserstein features and memory replay to achieve SOTA performance while mitigating forgetting.
Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection
cs.CV 2026-04 unverdicted novelty 4.0

Face-D²CL fuses spatial and frequency features and uses dual continual learning to reduce forgetting while adapting to new DeepFakes, cutting average error rates by 60.7% and raising unseen-domain AUC by 7.9% over prior SOTA.