WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
Dreamwalker: Mental planning for continuous vision-language navigation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.
citing papers explorer
-
WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
-
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.