Indoor Semantic Segmentation using depth information

Camille Couprie; Cl\'ement Farabet; Laurent Najman; Yann LeCun

arxiv: 1301.3572 · v2 · pith:PB4GFYSF · submitted 2013-01-16 · cs.CV

Indoor Semantic Segmentation using depth information

Camille Couprie , Cl\'ement Farabet , Laurent Najman , Yann LeCun This is my paper

Reviewed by Pith T0 review T1 audit T2 compute T3 formal T4 kernel pith:PB4GFYSF record.json open to challenge →

classification cs.CV

keywords depthindoorfeaturesinformationscenessegmentationaccuracyaddresses

0 comments

read the original abstract

This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such as an FPGA.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework
cs.CV 2025-06 unverdicted novelty 6.0

DEGround presents a unified homogeneous framework for 3D visual grounding with shared queries and two plug-in modules for better instruction alignment, reporting a 7.52% improvement on the EmbodiedScan benchmark.
Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering
cs.CV 2026-05 unverdicted novelty 5.0

Multi-level Floyd-Steinberg dithering defends DINOv2 and PaliGemma models against PGD, MI-FGSM and SIA attacks on six tasks while causing less clean-input degradation than diffusion denoising or other baselines.
CL-DMDF:Dynamic Multimodal Data Fusion Model Based on Contrastive Learning
cs.LG 2026-06 unverdicted novelty 4.0

CL-DMDF is a new multimodal fusion architecture that uses feature-modality attention, centroid-based contrastive learning, and adaptive fusion to improve performance under missing modalities.
A review on deep learning techniques for 3D sensed data classification
cs.CV 2019-07 unverdicted novelty 1.0

A survey of deep learning architectures for 3D sensed data classification covering RGB-D, multi-view, volumetric and end-to-end methods along with datasets and future directions.