FiLM-Nav fine-tunes VLMs on a mixture of simulated navigation tasks to reach state-of-the-art SPL and success on HM3D ObjectNav and OVON benchmarks with generalization to unseen categories.
Uni-navid: A video-based vision-language-action model for unifying embodied navigation tasks,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FiLM-Nav: Efficient and Generalizable Navigation via VLM Fine-tuning
FiLM-Nav fine-tunes VLMs on a mixture of simulated navigation tasks to reach state-of-the-art SPL and success on HM3D ObjectNav and OVON benchmarks with generalization to unseen categories.