pith. sign in

arxiv: 2603.28691 · v2 · pith:VKUYKQ6Hnew · submitted 2026-03-30 · 💻 cs.RO

DRIVE-Nav: Directional Reasoning, Inspection, and Verification for Efficient Open-Vocabulary Navigation

classification 💻 cs.RO
keywords drive-navdirectionsinspectionbestdirectionalefficiencyframeworkhm3d-ovon
0
0 comments X
read the original abstract

Open-Vocabulary Object Navigation (OVON) requires an embodied agent to locate a language-specified target in unknown environments. Many zero-shot methods rely on frontier-candidate reasoning under incomplete observations, while topology-aware methods reduce candidate redundancy but may still introduce panoramic inspection overhead and repeated reconsideration. We present DRIVE-Nav, a structured framework that organizes exploration around persistent directions rather than raw frontiers. By inspecting encountered directions more completely and restricting subsequent decisions to still-relevant directions within a forward 240-degree view range, DRIVE-Nav reduces redundant revisits and improves path efficiency. The framework extracts and tracks directional candidates from weighted Fast Marching Method (FMM) paths, maintains representative views for semantic inspection, and combines vision-language-guided prompt enrichment with cross-frame verification to improve grounding reliability. Experiments on HM3D-OVON, HM3Dv1, HM3Dv2, and MP3D demonstrate strong overall performance and consistent efficiency gains. On HM3D-OVON, DRIVE-Nav achieves 50.2% SR and 32.6% SPL, improving the previous best method by 1.9% SR and 5.6% SPL. It also delivers the best SPL on HM3Dv1, HM3Dv2, and MP3D and transfers to a physical humanoid robot. Real-world deployment also demonstrates its effectiveness.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SpaceVLN: A Zero-Shot Vision-and-Language Navigation Agent with Online Spatial Cognitive Memory and Reasoning

    cs.RO 2026-06 unverdicted novelty 6.0

    SpaceVLN proposes a stagewise closed-loop framework using Spatial Cognitive Memory and Spatial-CoT for zero-shot vision-and-language navigation and object-goal navigation, reporting SOTA results on R2R-CE, RxR-CE, GN-...