A vision-language model outputs dual heatmaps for navigation affordance and facing to ground semantic instructions into executable free space, achieving higher affordance rates than waypoint regression across simulated robot embodiments.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
dataset 1polarities
use dataset 1representative citing papers
GLM-5V-Turbo integrates multimodal perception as a core part of reasoning and execution for agentic tasks, reporting strong results in visual tool use and multimodal coding while keeping text-only performance competitive.
citing papers explorer
-
Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation
A vision-language model outputs dual heatmaps for navigation affordance and facing to ground semantic instructions into executable free space, achieving higher affordance rates than waypoint regression across simulated robot embodiments.
-
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
GLM-5V-Turbo integrates multimodal perception as a core part of reasoning and execution for agentic tasks, reporting strong results in visual tool use and multimodal coding while keeping text-only performance competitive.