Video Scene Parsing with Predictive Feature Learning

arxiv: 1612.00119 · v2 · pith:XT4WBMXEnew · submitted 2016-12-01 · 💻 cs.CV

Video Scene Parsing with Predictive Feature Learning

Xiaojie Jin , Xin Li , Huaxin Xiao , Xiaohui Shen , Zhe Lin , Jimei Yang , Yunpeng Chen , Jian Dong

show 4 more authors

Luoqi Liu Zequn Jie Jiashi Feng Shuicheng Yan

This is my paper

classification 💻 cs.CV

keywords parsingvideoscenelearningmethodsfeaturesannotationschallenging

0 comments p. Extension

pith:XT4WBMXE Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{XT4WBMXE}

Prints a linked pith:XT4WBMXE badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

In this work, we address the challenging video scene parsing problem by developing effective representation learning methods given limited parsing annotations. In particular, we contribute two novel methods that constitute a unified parsing framework. (1) \textbf{Predictive feature learning}} from nearly unlimited unlabeled video data. Different from existing methods learning features from single frame parsing, we learn spatiotemporal discriminative features by enforcing a parsing network to predict future frames and their parsing maps (if available) given only historical frames. In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations. (2) \textbf{Prediction steering parsing}} architecture that effectively adapts the learned spatiotemporal features to scene parsing tasks and provides strong guidance for any off-the-shelf parsing model to achieve better video scene parsing performance. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our methods by showing significant improvement over well-established baselines.

This paper has not been read by Pith yet.

Video Scene Parsing with Predictive Feature Learning

discussion (0)