hub

Joint 2d-3d-semantic data for indoor scene understanding

Iro Armeni, Sasha Sax, Amir R Zamir, Silvio Savarese · 2017 · cs.CV · arXiv 1702.01105

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open full Pith review browse 15 citing papers arXiv PDF

abstract

We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. The dataset covers over 6,000m2 and contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360{\deg} equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces. The dataset is available here: http://3Dsemantics.stanford.edu/

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 3 background 1

citation-polarity summary

use dataset 3 background 1

representative citing papers

CalibAnyView: Beyond Single-View Camera Calibration in the Wild

cs.CV · 2026-05-14 · conditional · novelty 8.0

A multi-view transformer predicts dense perspective fields that feed a geometric optimizer to estimate camera intrinsics and gravity from arbitrary numbers of real-world views.

Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

cs.CV · 2021-09-16 · accept · novelty 8.0

HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.

Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

LangTail uses entity-level semantic priors from language models aligned via contrastive learning in a hierarchical clustering setup to resolve long-tail ambiguity, yielding +13.5, +12.9, and +8.9 mIoU gains on ScanNet-v2, S3DIS, and nuScenes.

VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation

cs.CV · 2026-03-19 · unverdicted · novelty 7.0

VGGT-360 delivers geometry-consistent zero-shot panoramic depth by converting panoramas into multi-view 3D reconstructions via VGGT models and three plug-and-play correction modules, then reprojecting the result.

Automatic reconstruction of fully volumetric 3D building models from point clouds

cs.GR · 2019-07-01 · unverdicted · novelty 7.0

Presents a novel integer linear programming approach for automatic reconstruction of volumetric, parametric, multi-story building models from unstructured point clouds without requiring initial room segmentation.

PointGS: Semantic-Consistent Unsupervised 3D Point Cloud Segmentation with 3D Gaussian Splatting

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

PointGS achieves semantic-consistent unsupervised 3D point cloud segmentation by using 3D Gaussian Splatting to bridge discrete points and continuous 2D images for distilling SAM semantics.

BIMStruct3D: A Fully Automated Hybrid Learning Scan-to-BIM Pipeline with Integrated Topology Refinement

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

A hybrid pipeline combines semantic segmentation of point clouds with topology-aware reconstruction to generate BIM models, introduces the vIoU evaluation metric, and releases the DeKH dataset with demonstrated gains over RANSAC baselines.

Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

Holo360D is the first large-scale dataset providing continuous panoramic sequences with accurately aligned high-completeness depth maps and meshes for training panoramic 3D reconstruction models.

STRNet: Visual Navigation with Spatio-Temporal Representation through Dynamic Graph Aggregation

cs.CV · 2026-04-03 · conditional · novelty 7.0

STRNet improves goal-conditioned visual navigation by replacing simplistic encoders and pooling with a spatio-temporal fusion module that performs spatial graph reasoning and hybrid temporal modeling.

PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion

cs.CV · 2026-01-12 · unverdicted · novelty 6.0

PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.

PointCaM: Cut-and-Mix for Open-Set Point Cloud Learning

cs.CV · 2022-12-05 · unverdicted · novelty 6.0

PointCaM proposes a cut-and-mix mechanism with an Unknown-Point Simulator and Estimator to improve open-set recognition on point clouds by simulating out-of-distribution data and using multi-level features.

EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision

cs.CV · 2026-05-13 · unverdicted · novelty 5.0

EvObj learns evolving object-centric representations for unsupervised 3D instance segmentation by dynamically refining object candidates and completing partial geometries to bridge the synthetic-to-real domain gap, outperforming baselines on real and synthetic datasets.

From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments

cs.CV · 2026-05-03 · unverdicted · novelty 5.0 · 2 refs

Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.

INSIGHT: Indoor Scene Intelligence from Geometric-Semantic Hierarchy Transfer for Public~Safety

cs.CV · 2026-04-25 · unverdicted · novelty 4.0

INSIGHT transfers 2D semantic understanding from foundation models and traditional CV tools into 3D point clouds and compressed scene graphs for indoor public-safety mapping without target-domain labels.

A review on deep learning techniques for 3D sensed data classification

cs.CV · 2019-07-09 · unverdicted · novelty 1.0

A survey of deep learning architectures for 3D sensed data classification covering RGB-D, multi-view, volumetric and end-to-end methods along with datasets and future directions.

citing papers explorer

Showing 15 of 15 citing papers.

CalibAnyView: Beyond Single-View Camera Calibration in the Wild cs.CV · 2026-05-14 · conditional · none · ref 2 · internal anchor
A multi-view transformer predicts dense perspective fields that feed a geometric optimizer to estimate camera intrinsics and gravity from arbitrary numbers of real-world views.
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI cs.CV · 2021-09-16 · accept · none · ref 3 · internal anchor
HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.
Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors cs.CV · 2026-05-20 · unverdicted · none · ref 43 · internal anchor
LangTail uses entity-level semantic priors from language models aligned via contrastive learning in a hierarchical clustering setup to resolve long-tail ambiguity, yielding +13.5, +12.9, and +8.9 mIoU gains on ScanNet-v2, S3DIS, and nuScenes.
VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation cs.CV · 2026-03-19 · unverdicted · none · ref 3 · internal anchor
VGGT-360 delivers geometry-consistent zero-shot panoramic depth by converting panoramas into multi-view 3D reconstructions via VGGT models and three plug-and-play correction modules, then reprojecting the result.
Automatic reconstruction of fully volumetric 3D building models from point clouds cs.GR · 2019-07-01 · unverdicted · none · ref 48 · internal anchor
Presents a novel integer linear programming approach for automatic reconstruction of volumetric, parametric, multi-story building models from unstructured point clouds without requiring initial room segmentation.
PointGS: Semantic-Consistent Unsupervised 3D Point Cloud Segmentation with 3D Gaussian Splatting cs.CV · 2026-05-12 · unverdicted · none · ref 2
PointGS achieves semantic-consistent unsupervised 3D point cloud segmentation by using 3D Gaussian Splatting to bridge discrete points and continuous 2D images for distilling SAM semantics.
BIMStruct3D: A Fully Automated Hybrid Learning Scan-to-BIM Pipeline with Integrated Topology Refinement cs.CV · 2026-04-27 · unverdicted · none · ref 1
A hybrid pipeline combines semantic segmentation of point clouds with topology-aware reconstruction to generate BIM models, introduces the vIoU evaluation metric, and releases the DeKH dataset with demonstrated gains over RANSAC baselines.
Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond cs.CV · 2026-04-24 · unverdicted · none · ref 4
Holo360D is the first large-scale dataset providing continuous panoramic sequences with accurately aligned high-completeness depth maps and meshes for training panoramic 3D reconstruction models.
STRNet: Visual Navigation with Spatio-Temporal Representation through Dynamic Graph Aggregation cs.CV · 2026-04-03 · conditional · none · ref 3
STRNet improves goal-conditioned visual navigation by replacing simplistic encoders and pooling with a spatio-temporal fusion module that performs spatial graph reasoning and hybrid temporal modeling.
PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion cs.CV · 2026-01-12 · unverdicted · none · ref 1 · internal anchor
PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.
PointCaM: Cut-and-Mix for Open-Set Point Cloud Learning cs.CV · 2022-12-05 · unverdicted · none · ref 11 · internal anchor
PointCaM proposes a cut-and-mix mechanism with an Unknown-Point Simulator and Estimator to improve open-set recognition on point clouds by simulating out-of-distribution data and using multi-level features.
EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision cs.CV · 2026-05-13 · unverdicted · none · ref 1 · internal anchor
EvObj learns evolving object-centric representations for unsupervised 3D instance segmentation by dynamically refining object candidates and completing partial geometries to bridge the synthetic-to-real domain gap, outperforming baselines on real and synthetic datasets.
From Spherical to Gaussian: A Comparative Analysis of Point Cloud Cropping Strategies in Large-Scale 3D Environments cs.CV · 2026-05-03 · unverdicted · none · ref 2 · 2 links · internal anchor
Gaussian and related cropping strategies for point cloud subclouds improve 3D neural network performance over spherical cropping on large outdoor scenes.
INSIGHT: Indoor Scene Intelligence from Geometric-Semantic Hierarchy Transfer for Public~Safety cs.CV · 2026-04-25 · unverdicted · none · ref 6
INSIGHT transfers 2D semantic understanding from foundation models and traditional CV tools into 3D point clouds and compressed scene graphs for indoor public-safety mapping without target-domain labels.
A review on deep learning techniques for 3D sensed data classification cs.CV · 2019-07-09 · unverdicted · none · ref 47 · internal anchor
A survey of deep learning architectures for 3D sensed data classification covering RGB-D, multi-view, volumetric and end-to-end methods along with datasets and future directions.

Joint 2d-3d-semantic data for indoor scene understanding

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer