Hierarchical Memory Networks

Gerald Tesauro; Hugo Larochelle; Pascal Vincent; Sarath Chandar; Sungjin Ahn; Yoshua Bengio

arxiv: 1605.07427 · v1 · pith:VEKXUQ6Fnew · submitted 2016-05-24 · 📊 stat.ML · cs.CL· cs.LG· cs.NE

Hierarchical Memory Networks

Sarath Chandar , Sungjin Ahn , Hugo Larochelle , Pascal Vincent , Gerald Tesauro , Yoshua Bengio This is my paper

classification 📊 stat.ML cs.CLcs.LGcs.NE

keywords memoryattentionhierarchicalnetworknetworkshardsoftchallenging

0 comments

read the original abstract

Memory networks are neural networks with an explicit memory component that can be both read and written to by the network. The memory is often addressed in a soft way using a softmax function, making end-to-end training with backpropagation possible. However, this is not computationally scalable for applications which require the network to read from extremely large memories. On the other hand, it is well known that hard attention mechanisms based on reinforcement learning are challenging to train successfully. In this paper, we explore a form of hierarchical memory network, which can be considered as a hybrid between hard and soft attention memory networks. The memory is organized in a hierarchical structure such that reading from it is done with less computation than soft attention over a flat memory, while also being easier to train than hard attention over a flat memory. Specifically, we propose to incorporate Maximum Inner Product Search (MIPS) in the training and inference procedures for our hierarchical memory network. We explore the use of various state-of-the art approximate MIPS techniques and report results on SimpleQuestions, a challenging large scale factoid question answering task.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning to Theorize the World from Observation
cs.LG 2026-05 unverdicted novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
cs.CV 2023-08 unverdicted novelty 6.0

DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.
Pyramid: A General Framework for Distributed Similarity Search
cs.DC 2019-06 unverdicted novelty 6.0

Pyramid is a distributed similarity search framework based on HNSW that partitions datasets into similar-item sub-datasets for efficient query processing and includes failure recovery and straggler mitigation.