VLN-Cache delivers up to 1.52x faster inference in VLN models by using view-aligned remapping for geometric consistency and a task-relevance saliency filter to manage semantic changes during navigation.
Etp-r1: Evolving topological planning with rein- forcement fine-tuning for vision-language navigation in continuous environments
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5representative citing papers
FreqCache uses frequency domain properties to adaptively select, refresh, and budget token caches in VLN models, delivering 1.59x speedup with negligible overhead.
GROW decomposes trajectories into state-action samples to enable GRPO for multi-turn VLM agents and reports state-of-the-art results on more than 800 Minecraft tasks.
StereoNav reaches new benchmark highs on R2R-CE and RxR-CE and improves real-robot reliability by supplying persistent target-location priors and stereo-derived geometry that stay stable under lighting changes and blur.
LCGNav improves online topological VLN-CE by converting local depth views to physically truncated 3D point clouds and applying selective dimension-preserving fusion, yielding consistent gains on R2R-CE and RxR-CE benchmarks with open code.
citing papers explorer
-
VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness
VLN-Cache delivers up to 1.52x faster inference in VLN models by using view-aligned remapping for geometric consistency and a task-relevance saliency filter to manage semantic changes during navigation.
-
FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching
FreqCache uses frequency domain properties to adaptively select, refresh, and budget token caches in VLN models, delivering 1.59x speedup with negligible overhead.
-
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
GROW decomposes trajectories into state-action samples to enable GRPO for multi-turn VLM agents and reports state-of-the-art results on more than 800 Minecraft tasks.
-
What Limits Vision-and-Language Navigation ?
StereoNav reaches new benchmark highs on R2R-CE and RxR-CE and improves real-robot reliability by supplying persistent target-location priors and stereo-derived geometry that stay stable under lighting changes and blur.
-
LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation
LCGNav improves online topological VLN-CE by converting local depth views to physically truncated 3D point clouds and applying selective dimension-preserving fusion, yielding consistent gains on R2R-CE and RxR-CE benchmarks with open code.