JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas

Didier Stricker; Hideaki Kanayama; Jason Rambach; Kanta Shimizu; Mahdi Chamseddine; Sandeep Inuganti; Soichiro Yokota

arxiv: 2603.06168 · v3 · pith:UE3X33WRnew · submitted 2026-03-06 · 💻 cs.CV

JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas

Sandeep Inuganti , Hideaki Kanayama , Kanta Shimizu , Mahdi Chamseddine , Soichiro Yokota , Didier Stricker , Jason Rambach This is my paper

classification 💻 cs.CV

keywords panoramicpointsegmentationsemanticcloudsjopp-3dmodalitiesacross

0 comments

read the original abstract

Semantic segmentation across visual modalities such as 3D point clouds and panoramic images remains a challenging task, primarily due to the scarcity of annotated data and the limited adaptability of fixed-label models. In this paper, we present JOPP-3D, an open-vocabulary segmentation framework that jointly leverages panoramic and point cloud data to enable language-driven scene understanding. We convert RGB-D panoramic images into their corresponding wide field-of-view tangential perspectives and 3D point clouds, then use these modalities to extract and align foundational vision-language features. This allows natural language querying to generate semantic masks on both input modalities. Experimental evaluation on the Stanford-2D-3D-s and ToF-360 datasets demonstrates the capability of JOPP-3D to produce coherent and semantically meaningful segmentations across panoramic and 3D domains. Our proposed method achieves a significant improvement compared to the SOTA in open and closed vocabulary 2D and 3D semantic segmentation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Panoramic Scene Analysis: A Survey from Distortion-Aware Engineering to Sphere-Native Foundation Modeling
cs.CV 2026-06 unverdicted novelty 3.0

Survey organizing panoramic scene analysis literature by architectural design and training paradigm, identifying the absence of methods achieving both strict spherical equivariance and full reuse of perspective-pretra...