pith. sign in

arxiv: 1511.04119 · v3 · pith:RHV7E7OJnew · submitted 2015-11-12 · 💻 cs.LG · cs.CV

Action Recognition using Visual Attention

classification 💻 cs.LG cs.CV
keywords modelactionattentionframeslearnspartsrecognitiontask
0
0 comments X
read the original abstract

We propose a soft attention based model for the task of action recognition in videos. We use multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally. Our model learns to focus selectively on parts of the video frames and classifies videos after taking a few glimpses. The model essentially learns which parts in the frames are relevant for the task at hand and attaches higher importance to them. We evaluate the model on UCF-11 (YouTube Action), HMDB-51 and Hollywood2 datasets and analyze how the model focuses its attention depending on the scene and the action being performed.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Inverse Attention Guided Deep Crowd Counting Network

    cs.CV 2019-07 unverdicted novelty 6.0

    IA-DCCN is a single-step VGG-16 network that infuses segmentation via inverse attention to improve crowd counting accuracy on three datasets with minimal overhead.