pith. sign in

arxiv: 1406.6247 · v1 · pith:XRD27ZI3new · submitted 2014-06-24 · 💻 cs.LG · cs.CV· stat.ML

Recurrent Models of Visual Attention

classification 💻 cs.LG cs.CVstat.ML
keywords imagemodelneuralconvolutionalamountcomputationimagesnetwork
0
0 comments X
read the original abstract

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation

    cs.CL 2019-06 unverdicted novelty 3.0

    Attention layers do not improve BiLSTM performance on argument unit segmentation and contextualized embeddings show little benefit.