Robust navigation with language pretraining and stochastic sampling

Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah Smith, Yejin Choi · 1909 · arXiv 1909.02244

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

cs.CV · 2024-02-24 · unverdicted · novelty 6.0

NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.

The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

SDB balances behavioral diversity and learning stability in VLN self-improvement by expanding decisions into latent hypotheses, performing reliability-aware aggregation, and applying a regularizer, yielding gains such as SPL 33.73 to 35.93 on REVERIE val-unseen.

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

cs.CV · 2023-12-28 · unverdicted · novelty 5.0

MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.

citing papers explorer

Showing 3 of 3 citing papers.

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation cs.CV · 2024-02-24 · unverdicted · none · ref 55
NaVid, a video-based VLM trained on 510k navigation and 763k web samples, achieves SOTA VLN performance using only monocular RGB video for next-step action planning in sim and real environments.
The Essence of Balance for Self-Improving Agents in Vision-and-Language Navigation cs.CV · 2026-04-21 · unverdicted · none · ref 46
SDB balances behavioral diversity and learning stability in VLN self-improvement by expanding decisions into latent hypotheses, performing reliability-aware aggregation, and applying a regularizer, yielding gains such as SPL 33.73 to 35.93 on REVERIE val-unseen.
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices cs.CV · 2023-12-28 · unverdicted · none · ref 69
MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.

Robust navigation with language pretraining and stochastic sampling

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer