VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2verdicts
UNVERDICTED 2representative citing papers
MapNav uses annotated semantic maps as memory for VLN agents, claiming SOTA results in simulation and real-world tests while promising code and data release.
citing papers explorer
-
VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models
VP-VLA decouples high-level reasoning from low-level control in VLA models by rendering spatial anchors as visual prompts directly in the RGB observation space, outperforming end-to-end baselines.
-
MapNav: A Novel Memory Representation via Annotated Semantic Maps for Vision-and-Language Navigation
MapNav uses annotated semantic maps as memory for VLN agents, claiming SOTA results in simulation and real-world tests while promising code and data release.