Vision and language navigation in the real world via online visual language mapping

· 2023 · arXiv 2310.10822

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

cs.CV · 2024-02-24 · unverdicted · novelty 6.0

NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.

MAGNIFIED: RL Fine-tuning of Multimodal Large Language Models for Motion Planning

cs.RO · 2026-06-02 · unverdicted · novelty 5.0

MAGNIFIED applies RL fine-tuning to MLLMs for autonomous driving motion planning, yielding over 10.5% lower overlap rate and 38.9% lower off-road rate than SFT baseline on Waymo Open Motion Dataset.

Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation

cs.RO · 2026-06-01 · unverdicted · novelty 3.0

HSAN integrates hierarchical semantic graphs, optimal transport-based goal selection, and graph-aware RL to claim SOTA results on VLN-CE tasks.

citing papers explorer

Showing 3 of 3 citing papers.

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation cs.CV · 2024-02-24 · unverdicted · none · ref 108
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.
MAGNIFIED: RL Fine-tuning of Multimodal Large Language Models for Motion Planning cs.RO · 2026-06-02 · unverdicted · none · ref 5
MAGNIFIED applies RL fine-tuning to MLLMs for autonomous driving motion planning, yielding over 10.5% lower overlap rate and 38.9% lower off-road rate than SFT baseline on Waymo Open Motion Dataset.
Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation cs.RO · 2026-06-01 · unverdicted · none · ref 10
HSAN integrates hierarchical semantic graphs, optimal transport-based goal selection, and graph-aware RL to claim SOTA results on VLN-CE tasks.

Vision and language navigation in the real world via online visual language mapping

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer