WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
Wmnav: Integrating vision-language models into world models for object goal navigation,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
ReMemNav improves zero-shot object navigation success and efficiency by integrating episodic memory and rethinking with VLMs, achieving SR/SPL gains of 1.7%/7.0% on HM3D v0.1, 18.2%/11.1% on HM3D v0.2, and 8.7%/7.9% on MP3D.
Introduces a hierarchical VLN architecture with asynchronous layers, incremental memory graph, and WTRP-based exploration that improves success and efficiency on resource-constrained robots.
citing papers explorer
-
WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
WorldMAP bootstraps reliable trajectory prediction in vision-language navigation by converting world-model-generated futures into structured supervision, cutting ADE by 18% and FDE by 42.1% on Target-Bench while making small VLMs competitive with large ones.
-
ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation
ReMemNav improves zero-shot object navigation success and efficiency by integrating episodic memory and rethinking with VLMs, achieving SR/SPL gains of 1.7%/7.0% on HM3D v0.1, 18.2%/11.1% on HM3D v0.2, and 8.7%/7.9% on MP3D.
-
A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration
Introduces a hierarchical VLN architecture with asynchronous layers, incremental memory graph, and WTRP-based exploration that improves success and efficiency on resource-constrained robots.