Multimodal large lan- guage model for visual navigation

Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, Bowen Zhang, Jian Zhang · 2023 · arXiv 2310.08669

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

cs.RO · 2025-12-19 · conditional · novelty 6.0

ImagineNav++ achieves SOTA mapless visual navigation by prompting VLMs to select imagined future views generated from a human-preference-distilled module and maintained via selective foveation memory.

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

cs.CV · 2024-02-24 · unverdicted · novelty 6.0

NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.

citing papers explorer

Showing 2 of 2 citing papers.

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination cs.RO · 2025-12-19 · conditional · none · ref 42
ImagineNav++ achieves SOTA mapless visual navigation by prompting VLMs to select imagined future views generated from a human-preference-distilled module and maintained via selective foveation memory.
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation cs.CV · 2024-02-24 · unverdicted · none · ref 98
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.

Multimodal large lan- guage model for visual navigation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer