Overcoming catastrophic forgetting with hard attention to the task

Alexandros Karatzoglou; D\'idac Sur\'is; Joan Serr\`a; Marius Miron

arxiv: 1801.01423 · v3 · pith:KKQDDC7Unew · submitted 2018-01-04 · 💻 cs.LG · cs.AI· cs.NE· stat.ML

Overcoming catastrophic forgetting with hard attention to the task

Joan Serr\`a , D\'idac Sur\'is , Marius Miron , Alexandros Karatzoglou This is my paper

classification 💻 cs.LG cs.AIcs.NEstat.ML

keywords learningtaskattentioncatastrophicforgettinghardlearnedprevious

0 comments

read the original abstract

Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mesh Based Simulations with Spatial and Temporal awareness
cs.LG 2026-05 unverdicted novelty 5.0

A unified training framework for mesh-based ML surrogates in CFD improves accuracy and long-horizon stability by enforcing spatial derivative consistency via multi-node prediction, using temporal cross-attention corre...
Autoencoder-Based Incremental Class Learning without Retraining on Old Data
cs.LG 2019-07 unverdicted novelty 4.0

Autoencoder extracts class prototypes whose means enable metric classification in incremental learning, matching SOTA accuracy with lower memory overhead on CIFAR-100 and CUB-200-2011 via regularization to avoid forgetting.