Beyond the nav-graph: Vision-and-language navigation in continuous environments

· 2020

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.

P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

P2DNav proposes a three-part hierarchical framework (panorama-to-downview reasoning, sliding-window dialogue memory, and reflective reorientation) that reports large success-rate gains on the R2R-CE zero-shot VLN benchmark.

Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning

cs.CV · 2025-12-09 · unverdicted · novelty 6.0

A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models cs.AI · 2026-04-09 · unverdicted · none · ref 5
WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
P2DNav: Panorama-to-Downview Reasoning for Zero-shot Vision-and-Language Navigation cs.CV · 2026-05-19 · unverdicted · none · ref 12
P2DNav proposes a three-part hierarchical framework (panorama-to-downview reasoning, sliding-window dialogue memory, and reflective reorientation) that reports large success-rate gains on the R2R-CE zero-shot VLN benchmark.
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning cs.CV · 2025-12-09 · unverdicted · none · ref 25
A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.

Beyond the nav-graph: Vision-and-language navigation in continuous environments

fields

years

verdicts

representative citing papers

citing papers explorer