One-shot Learning with Memory-Augmented Neural Networks

Adam Santoro; Daan Wierstra; Matthew Botvinick; Sergey Bartunov; Timothy Lillicrap

arxiv: 1605.06065 · v1 · pith:RJBUDTORnew · submitted 2016-05-19 · 💻 cs.LG

One-shot Learning with Memory-Augmented Neural Networks

Adam Santoro , Sergey Bartunov , Matthew Botvinick , Daan Wierstra , Timothy Lillicrap This is my paper

classification 💻 cs.LG

keywords datamemoryneuralnetworksabilityinformationlearningmemory-augmented

0 comments

read the original abstract

Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reformer: The Efficient Transformer
cs.LG 2020-01 accept novelty 8.0

Reformer matches standard Transformer accuracy on long sequences while using far less memory and running faster via LSH attention and reversible residual layers.
Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery using Deep CNNs and Active Learning
cs.CV 2019-07 unverdicted novelty 7.0

Transfer Sampling with Optimal Transport and window cropping finds nearly 80% of animals in new UAV datasets using under 0.5% of labels.
A Self-Attentive model for Knowledge Tracing
cs.LG 2019-07 unverdicted novelty 7.0

SAKT uses self-attention to focus on relevant prior KCs for performance prediction and reports 4.43% average AUC improvement over DKT and DKVMN on real datasets.
Compressive Transformers for Long-Range Sequence Modelling
cs.LG 2019-11 unverdicted novelty 6.0

Compressive Transformer sets new records on WikiText-103 (17.1 ppl) and Enwik8 (0.97 bpc) via memory compression and introduces the PG-19 long-range language benchmark.