VLN-Cache delivers up to 1.52x faster inference in VLN models by using view-aligned remapping for geometric consistency and a task-relevance saliency filter to manage semantic changes during navigation.
arXiv preprint arXiv:2512.10310 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
baseline 1polarities
baseline 1representative citing papers
IDEA is a TTA framework for VLN that builds a dynamic asset library from Fisher-weighted soft prompts and domain coordinates, then uses convex-hull projection for cross-domain bridging and training-free adaptation.
PanoWorld adds spherical spatial cross-attention and pano-native training data to MLLMs for improved spatial reasoning on ERP panoramas, outperforming baselines on new and existing benchmarks.
FreqCache uses frequency domain properties to adaptively select, refresh, and budget token caches in VLN models, delivering 1.59x speedup with negligible overhead.
StereoNav reaches new benchmark highs on R2R-CE and RxR-CE and improves real-robot reliability by supplying persistent target-location priors and stereo-derived geometry that stay stable under lighting changes and blur.
citing papers explorer
-
VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness
VLN-Cache delivers up to 1.52x faster inference in VLN models by using view-aligned remapping for geometric consistency and a task-relevance saliency filter to manage semantic changes during navigation.
-
Turning Adaptation into Assets: Cross-Domain Bridging for Online Vision-Language Navigation
IDEA is a TTA framework for VLN that builds a dynamic asset library from Fisher-weighted soft prompts and domain coordinates, then uses convex-hull projection for cross-domain bridging and training-free adaptation.
-
PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World
PanoWorld adds spherical spatial cross-attention and pano-native training data to MLLMs for improved spatial reasoning on ERP panoramas, outperforming baselines on new and existing benchmarks.
-
FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching
FreqCache uses frequency domain properties to adaptively select, refresh, and budget token caches in VLN models, delivering 1.59x speedup with negligible overhead.
-
What Limits Vision-and-Language Navigation ?
StereoNav reaches new benchmark highs on R2R-CE and RxR-CE and improves real-robot reliability by supplying persistent target-location priors and stereo-derived geometry that stay stable under lighting changes and blur.