Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

· 2026 · cs.RO · arXiv 2604.26839

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-horizon settings. To bridge this gap, we propose Walk with Me, a map-free framework for long-horizon social navigation from high-level human instructions. Walk with Me leverages GPS context and lightweight candidate points-of-interest from a public map API for semantic destination grounding and waypoint proposal. A High-Level Vision-Language Model grounds abstract instructions into concrete destinations and plans coarse waypoint sequences. During execution, an observation-aware routing mechanism determines whether the Low-Level Vision-Language-Action policy can handle the current situation or whether explicit safety reasoning from the High-Level VLM is needed. Routine segments are executed by the Low-Level VLA, while complex situations such as crowded crossings trigger high-level reasoning and stop-and-wait behavior when unsafe. By combining semantic intent grounding, map-free long-horizon planning, safety-aware reasoning, and low-level action generation, Walk with Me enables practical outdoor social navigation for human-centric assistance.

representative citing papers

OneVLA: A Unified Framework for Embodied Tasks

cs.RO · 2026-05-31 · unverdicted · novelty 6.0

OneVLA is a unified VLA model using a shared action head and multi-stage progressive training with CoT fine-tuning that reports state-of-the-art results on both navigation and manipulation in simulation and real-world settings.

FutureNav: Unified World-Action Modeling for Vision-and-Language Navigation

cs.RO · 2026-06-29 · unverdicted · novelty 5.0

FutureNav proposes a 4B-scale VLM that jointly optimizes action prediction, inverse/forward dynamics, and future state generation for VLN and reports SOTA results on multiple benchmarks.

Vision-Language Models for Deployable Social Robot Navigation: Bridging Semantic Reasoning and Low-Level Control

cs.RO · 2026-06-27 · unverdicted · novelty 4.0

Survey organizing VLM-based social robot navigation into reasoning, planning, and bridging components with a proposed roadmap for hybrid deployable systems.

citing papers explorer

Showing 3 of 3 citing papers after filters.

OneVLA: A Unified Framework for Embodied Tasks cs.RO · 2026-05-31 · unverdicted · none · ref 55 · internal anchor
OneVLA is a unified VLA model using a shared action head and multi-stage progressive training with CoT fine-tuning that reports state-of-the-art results on both navigation and manipulation in simulation and real-world settings.
FutureNav: Unified World-Action Modeling for Vision-and-Language Navigation cs.RO · 2026-06-29 · unverdicted · none · ref 15 · internal anchor
FutureNav proposes a 4B-scale VLM that jointly optimizes action prediction, inverse/forward dynamics, and future state generation for VLN and reports SOTA results on multiple benchmarks.
Vision-Language Models for Deployable Social Robot Navigation: Bridging Semantic Reasoning and Low-Level Control cs.RO · 2026-06-27 · unverdicted · none · ref 78 · internal anchor
Survey organizing VLM-based social robot navigation into reasoning, planning, and bridging components with a proposed roadmap for hybrid deployable systems.

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

fields

years

verdicts

representative citing papers

citing papers explorer