GA-VLN builds a geometry-aware BEV representation from RGB-D inputs plus 3D foundation model features to deliver state-of-the-art vision-language navigation using only navigation data.
Monodream: Monocular vision-language navigation with panoramic dreaming.arXiv preprint arXiv:2508.02549, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.
Semantic progress reasoning predicts instruction-style advancement from visual history to guide policies, yielding state-of-the-art success and efficiency on R2R-CE and RxR-CE.
citing papers explorer
-
GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation
GA-VLN builds a geometry-aware BEV representation from RGB-D inputs plus 3D foundation model features to deliver state-of-the-art vision-language navigation using only navigation data.
-
Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
A monocular RGB-only aerial VLN framework outperforms baselines via prompt-guided multi-task learning, keyframe selection, and label reweighting on AerialVLN and OpenFly benchmarks.
-
Progress-Think: Semantic Progress Reasoning for Vision-Language Navigation
Semantic progress reasoning predicts instruction-style advancement from visual history to guide policies, yielding state-of-the-art success and efficiency on R2R-CE and RxR-CE.