DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

· 2026 · cs.RO · arXiv 2604.12486

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Long-horizon collaborative vision-language navigation (VLN) is critical for multi-robot systems to accomplish complex tasks beyond the capability of a single agent. CoNavBench takes a first step by introducing the first collaborative long-horizon VLN benchmark with relay-style multi-robot tasks, a collaboration taxonomy, along with graph-grounded generation and evaluation to model handoffs and rendezvous in shared environments. However, existing benchmarks and evaluations often do not enforce strictly synchronized dual-robot rollout on a shared world timeline, and they typically rely on static coordination policies that cannot adapt when new cross-agent evidence emerges. We present Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation (DeCoNav), a decentralized framework that couples event-triggered dialogue with dynamic task allocation and replanning for real-time, adaptive coordination. In DeCoNav, robots exchange compact semantic states via dialogue without a central controller. When informative events such as new evidence, uncertainty, or conflicts arise, dialogue is triggered to dynamically reassign subgoals and replan under synchronized execution. Implemented in DeCoNavBench with 1,213 tasks across 176 HM3D scenes, DeCoNav improves the both-success rate (BSR) by 69.2%, demonstrating the effectiveness of dialogue-driven, dynamically reallocated planning for multi-robot collaboration.

representative citing papers

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

cs.CV · 2026-05-18 · conditional · novelty 7.0

SP-CoR is a multimodal LLM framework using dynamics-aware sampling, spectral-physics view fusion, and prompt distillation that outperforms baselines on the new CoopSR benchmark and EgoTeam dataset for multi-robot cooperative spatial reasoning.

SpaceVLN: A Zero-Shot Vision-and-Language Navigation Agent with Online Spatial Cognitive Memory and Reasoning

cs.RO · 2026-06-08 · unverdicted · novelty 6.0

SpaceVLN proposes a stagewise closed-loop framework using Spatial Cognitive Memory and Spatial-CoT for zero-shot vision-and-language navigation and object-goal navigation, reporting SOTA results on R2R-CE, RxR-CE, GN-Bench, and HM3D-OVON plus real-robot tests.

GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language Navigation

cs.RO · 2026-06-02 · unverdicted · novelty 4.0

GN0 curates GN-Matrix dataset, builds 3DGS simulator and GN-Bench, and trains BAE model via supervised learning plus DAgger and RL to unify VLN tasks and outperform prior methods on GN-Bench and VLN-CE.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models cs.CV · 2026-05-18 · conditional · none · ref 74 · internal anchor
SP-CoR is a multimodal LLM framework using dynamics-aware sampling, spectral-physics view fusion, and prompt distillation that outperforms baselines on the new CoopSR benchmark and EgoTeam dataset for multi-robot cooperative spatial reasoning.
SpaceVLN: A Zero-Shot Vision-and-Language Navigation Agent with Online Spatial Cognitive Memory and Reasoning cs.RO · 2026-06-08 · unverdicted · none · ref 28 · internal anchor
SpaceVLN proposes a stagewise closed-loop framework using Spatial Cognitive Memory and Spatial-CoT for zero-shot vision-and-language navigation and object-goal navigation, reporting SOTA results on R2R-CE, RxR-CE, GN-Bench, and HM3D-OVON plus real-robot tests.
GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language Navigation cs.RO · 2026-06-02 · unverdicted · none · ref 14 · internal anchor
GN0 curates GN-Matrix dataset, builds 3DGS simulator and GN-Bench, and trains BAE model via supervised learning plus DAgger and RL to unify VLN tasks and outperform prior methods on GN-Bench and VLN-CE.

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

fields

years

verdicts

representative citing papers

citing papers explorer