Hierarchical Deep Recurrent Architecture for Video Understanding

arxiv: 1707.03296 · v1 · pith:7VIKCEZSnew · submitted 2017-07-11 · 💻 cs.CV

Hierarchical Deep Recurrent Architecture for Video Understanding

Luming Tang , Boyang Deng , Haiyu Zhao , Shuai Yi This is my paper

classification 💻 cs.CV

keywords videoattentionclassificationpartframesmethodspoolingarchitecture

0 comments p. Extension

pith:7VIKCEZS Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{7VIKCEZS}

Prints a linked pith:7VIKCEZS badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

This paper introduces the system we developed for the Youtube-8M Video Understanding Challenge, in which a large-scale benchmark dataset was used for multi-label video classification. The proposed framework contains hierarchical deep architecture, including the frame-level sequence modeling part and the video-level classification part. In the frame-level sequence modelling part, we explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM (HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of frames in a video. We also introduce two attention pooling methods, single attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we can pay more attention to the informative frames in a video and ignore the useless frames. In the video-level classification part, two methods are proposed to increase the classification performance, i.e. Hierarchical-Mixture-of-Experts (HMoE) and Classifier Chains (CC). Our final submission is an ensemble consisting of 18 sub-models. In terms of the official evaluation metric Global Average Precision (GAP) at 20, our best submission achieves 0.84346 on the public 50% of test dataset and 0.84333 on the private 50% of test data.

This paper has not been read by Pith yet.

Hierarchical Deep Recurrent Architecture for Video Understanding

discussion (0)