HCSG combines geometric forecasting of human pose and trajectory with VLM-generated semantic descriptions of intentions, fused into a topological map with a social distance loss, yielding 14% higher success rate and 34% lower collision rate on the HA-VLNCE benchmark.
P3nav: End-to-end perception, prediction and planning for vision-and-language navigation,
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 3years
2026 3verdicts
UNVERDICTED 3roles
baseline 1polarities
baseline 1representative citing papers
SEDualVLN proposes a spatially-enhanced dual-system VLN framework that pairs a fast VLM action generator with a slow MLLM waypoint planner and reports state-of-the-art results on VLN-CE benchmarks.
StereoNav reaches new benchmark highs on R2R-CE and RxR-CE and improves real-robot reliability by supplying persistent target-location priors and stereo-derived geometry that stay stable under lighting changes and blur.
citing papers explorer
-
HCSG: Human-Centric Semantic-Geometric Reasoning for Vision-Language Navigation
HCSG combines geometric forecasting of human pose and trajectory with VLM-generated semantic descriptions of intentions, fused into a topological map with a social distance loss, yielding 14% higher success rate and 34% lower collision rate on the HA-VLNCE benchmark.
-
SEDualVLN: A Spatially-Enhanced Dual-System for Vision-Language Navigation
SEDualVLN proposes a spatially-enhanced dual-system VLN framework that pairs a fast VLM action generator with a slow MLLM waypoint planner and reports state-of-the-art results on VLN-CE benchmarks.
-
What Limits Vision-and-Language Navigation ?
StereoNav reaches new benchmark highs on R2R-CE and RxR-CE and improves real-robot reliability by supplying persistent target-location priors and stereo-derived geometry that stay stable under lighting changes and blur.