Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

arxiv: 1803.04687 · v1 · pith:EUH5ISUNnew · submitted 2018-03-13 · 💻 cs.CV

Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

Abrar H. Abdulnabi , Bing Shuai , Zhen Zuo , Lap-Pui Chau , Gang Wang This is my paper

classification 💻 cs.CV

keywords rnnsinformationmodelmultimodalpreviousfeatureshiddenimage

0 comments p. Extension

pith:EUH5ISUN Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{EUH5ISUN}

Prints a linked pith:EUH5ISUN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation. It is optimized to classify image pixels given two input sources: RGB color channels and Depth maps. It simultaneously performs training of two recurrent neural networks (RNNs) that are crossly connected through information transfer layers, which are learnt to adaptively extract relevant cross-modality features. Each RNN model learns its representations from its own previous hidden states and transferred patterns from the other RNNs previous hidden states; thus, both model-specific and crossmodality features are retained. We exploit the structure of quad-directional 2D-RNNs to model the short and long range contextual information in the 2D input image. We carefully designed various baselines to efficiently examine our proposed model structure. We test our Multimodal RNNs method on popular RGB-D benchmarks and show how it outperforms previous methods significantly and achieves competitive results with other state-of-the-art works.

This paper has not been read by Pith yet.

Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

discussion (0)