DRIVE-Nav: Directional Reasoning, Inspection, and Verification for Efficient Open-Vocabulary Navigation

Jinhui Zhang; Longze Yuan; Maoguo Gao; Suli Zou; Zejun Zhu; Zhengwei Ma; Zhigang Gao; Zhiming Sun; Zhongjing Ma

arxiv: 2603.28691 · v2 · pith:VKUYKQ6Hnew · submitted 2026-03-30 · 💻 cs.RO

DRIVE-Nav: Directional Reasoning, Inspection, and Verification for Efficient Open-Vocabulary Navigation

Maoguo Gao , Zejun Zhu , Zhiming Sun , Zhengwei Ma , Longze Yuan , Zhongjing Ma , Zhigang Gao , Jinhui Zhang

show 1 more author

Suli Zou

This is my paper

classification 💻 cs.RO

keywords drive-navdirectionsinspectionbestdirectionalefficiencyframeworkhm3d-ovon

0 comments

read the original abstract

Open-Vocabulary Object Navigation (OVON) requires an embodied agent to locate a language-specified target in unknown environments. Many zero-shot methods rely on frontier-candidate reasoning under incomplete observations, while topology-aware methods reduce candidate redundancy but may still introduce panoramic inspection overhead and repeated reconsideration. We present DRIVE-Nav, a structured framework that organizes exploration around persistent directions rather than raw frontiers. By inspecting encountered directions more completely and restricting subsequent decisions to still-relevant directions within a forward 240-degree view range, DRIVE-Nav reduces redundant revisits and improves path efficiency. The framework extracts and tracks directional candidates from weighted Fast Marching Method (FMM) paths, maintains representative views for semantic inspection, and combines vision-language-guided prompt enrichment with cross-frame verification to improve grounding reliability. Experiments on HM3D-OVON, HM3Dv1, HM3Dv2, and MP3D demonstrate strong overall performance and consistent efficiency gains. On HM3D-OVON, DRIVE-Nav achieves 50.2% SR and 32.6% SPL, improving the previous best method by 1.9% SR and 5.6% SPL. It also delivers the best SPL on HM3Dv1, HM3Dv2, and MP3D and transfers to a physical humanoid robot. Real-world deployment also demonstrates its effectiveness.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SpaceVLN: A Zero-Shot Vision-and-Language Navigation Agent with Online Spatial Cognitive Memory and Reasoning
cs.RO 2026-06 unverdicted novelty 6.0

SpaceVLN proposes a stagewise closed-loop framework using Spatial Cognitive Memory and Spatial-CoT for zero-shot vision-and-language navigation and object-goal navigation, reporting SOTA results on R2R-CE, RxR-CE, GN-...