OneVLA is a unified VLA model using a shared action head and multi-stage progressive training with CoT fine-tuning that reports state-of-the-art results on both navigation and manipulation in simulation and real-world settings.
Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-horizon settings. To bridge this gap, we propose Walk with Me, a map-free framework for long-horizon social navigation from high-level human instructions. Walk with Me leverages GPS context and lightweight candidate points-of-interest from a public map API for semantic destination grounding and waypoint proposal. A High-Level Vision-Language Model grounds abstract instructions into concrete destinations and plans coarse waypoint sequences. During execution, an observation-aware routing mechanism determines whether the Low-Level Vision-Language-Action policy can handle the current situation or whether explicit safety reasoning from the High-Level VLM is needed. Routine segments are executed by the Low-Level VLA, while complex situations such as crowded crossings trigger high-level reasoning and stop-and-wait behavior when unsafe. By combining semantic intent grounding, map-free long-horizon planning, safety-aware reasoning, and low-level action generation, Walk with Me enables practical outdoor social navigation for human-centric assistance.
fields
cs.RO 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
FutureNav proposes a 4B-scale VLM that jointly optimizes action prediction, inverse/forward dynamics, and future state generation for VLN and reports SOTA results on multiple benchmarks.
Survey organizing VLM-based social robot navigation into reasoning, planning, and bridging components with a proposed roadmap for hybrid deployable systems.
citing papers explorer
-
OneVLA: A Unified Framework for Embodied Tasks
OneVLA is a unified VLA model using a shared action head and multi-stage progressive training with CoT fine-tuning that reports state-of-the-art results on both navigation and manipulation in simulation and real-world settings.
-
FutureNav: Unified World-Action Modeling for Vision-and-Language Navigation
FutureNav proposes a 4B-scale VLM that jointly optimizes action prediction, inverse/forward dynamics, and future state generation for VLN and reports SOTA results on multiple benchmarks.
-
Vision-Language Models for Deployable Social Robot Navigation: Bridging Semantic Reasoning and Low-Level Control
Survey organizing VLM-based social robot navigation into reasoning, planning, and bridging components with a proposed roadmap for hybrid deployable systems.