pith. machine review for the scientific record. sign in

arxiv: 1707.04555 · v1 · submitted 2017-07-14 · 💻 cs.CV

Recognition: unknown

Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding

Authors on Pith no claims yet
classification 💻 cs.CV
keywords temporalvideoapproachesmodelingrecognitionyoutube-8mchallengefast-forward
0
0 comments X
read the original abstract

This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place. Because the challenge provides pre-extracted visual and audio features instead of the raw videos, we mainly investigate various temporal modeling approaches to aggregate the frame-level features for multi-label video recognition. Our system contains three major components: two-stream sequence model, fast-forward sequence model and temporal residual neural networks. Experiment results on the challenging Youtube-8M dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing temporal modeling approaches in the large-scale video recognition tasks. To be noted, our fast-forward LSTM with a depth of 7 layers achieves 82.75% in term of GAP@20 on the Kaggle Public test set.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.