SEMNAV: Enhancing Visual Semantic Navigation in Robotics through Semantic Segmentation

Carlos Guti\'errez-\'Alvarez; Francisco Javier Acevedo-Rodr\'iguez; Rafael Flor-Rodr\'iguez; Roberto J. L\'opez-Sastre; Sergio Lafuente-Arroyo

arxiv: 2506.01418 · v2 · pith:FAPMW3VGnew · submitted 2025-06-02 · 💻 cs.RO · cs.CV

SEMNAV: Enhancing Visual Semantic Navigation in Robotics through Semantic Segmentation

Rafael Flor-Rodr\'iguez , Carlos Guti\'errez-\'Alvarez , Francisco Javier Acevedo-Rodr\'iguez , Sergio Lafuente-Arroyo , Roberto J. L\'opez-Sastre This is my paper

classification 💻 cs.RO cs.CV

keywords semanticsemnavenvironmentsnavigationvisualdatasetenvironmentmodels

0 comments

read the original abstract

Visual Semantic Navigation (VSN) is a fundamental problem in robotics, where an agent must navigate toward a target object in an unknown environment, mainly using visual information. Most state-of-the-art VSN models are trained in simulation environments, where rendered scenes of the real world are used, at best. These approaches typically rely on raw RGB data from the virtual scenes, which limits their ability to generalize to real-world environments due to domain adaptation issues. To tackle this problem, in this work, we propose SEMNAV, a novel approach that leverages semantic segmentation as the main visual input representation of the environment to enhance the agent's perception and decision-making capabilities. By explicitly incorporating this type of high-level semantic information, our model learns robust navigation policies that improve generalization across unseen environments, both in simulated and real world settings. We also introduce the SEMNAV dataset, a newly curated dataset designed for training semantic segmentation-aware navigation models like SEMNAV. Our approach is evaluated extensively in both simulated environments and with real-world robotic platforms. Experimental results demonstrate that SEMNAV outperforms existing state-of-the-art VSN models, achieving higher success rates in the Habitat 2.0 simulation environment, using the HM3D dataset. Furthermore, our real-world experiments highlight the effectiveness of semantic segmentation in mitigating the sim-to-real gap, making our model a promising solution for practical VSN-based robotic applications. The code and datasets are accessible at https://github.com/gramuah/semnav

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Designing Privacy-Preserving Visual Perception for Robot Navigation Based on User Privacy Preferences
cs.RO 2026-04 unverdicted novelty 5.0

User studies reveal preferences for visual abstractions and distance-dependent low-resolution capture, leading to a configurable privacy policy for robot navigation.