Learning to Remember Rare Events

Aurko Roy; {\L}ukasz Kaiser; Ofir Nachum; Samy Bengio

arxiv: 1703.03129 · v1 · pith:G3BPYVSOnew · submitted 2017-03-09 · 💻 cs.LG

Learning to Remember Rare Events

{\L}ukasz Kaiser , Ofir Nachum , Aurko Roy , Samy Bengio This is my paper

classification 💻 cs.LG

keywords learninglife-longmoduleone-shotdeepmemorynetworksneural

0 comments

read the original abstract

Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Few-Shot Video Classification via Temporal Alignment
cs.CV 2019-06 unverdicted novelty 6.0

TAM aligns query video frames to novel class examples, averages per-frame distances along the path, and uses continuous relaxation for end-to-end few-shot optimization, yielding gains on Kinetics and Something-Something-V2.