FiLM-Nav fine-tunes VLMs on a mixture of simulated navigation tasks to reach state-of-the-art SPL and success on HM3D ObjectNav and OVON benchmarks with generalization to unseen categories.
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.
citing papers explorer
-
FiLM-Nav: Efficient and Generalizable Navigation via VLM Fine-tuning
FiLM-Nav fine-tunes VLMs on a mixture of simulated navigation tasks to reach state-of-the-art SPL and success on HM3D ObjectNav and OVON benchmarks with generalization to unseen categories.
-
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.