Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

Christian Kerl; Daniel Cremers; J\"org St\"uckler; Lingni Ma

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1703.08866 v2 pith:NSAE6FRA submitted 2017-03-26 cs.CV

Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

Lingni Ma , J\"org St\"uckler , Christian Kerl , Daniel Cremers This is my paper

classification cs.CV

keywords deepmulti-viewnetworksemanticpredictionsrgb-dsegmentationviews

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel approach to object-class segmentation from multiple RGB-D views using deep learning. We train a deep neural network to predict object-class semantics that is consistent from several view points in a semi-supervised way. At test time, the semantics predictions of our network can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A review on deep learning techniques for 3D sensed data classification
cs.CV 2019-07 unverdicted novelty 1.0

A survey of deep learning architectures for 3D sensed data classification covering RGB-D, multi-view, volumetric and end-to-end methods along with datasets and future directions.