VAP-TAMP combines action knowledge, vision-language models for active view selection, and scene-graph reasoning to let robots perceive and resolve unforeseen execution-time situations during task and motion planning.
A machine-rendered reading of the paper's core claim, the
machinery that carries it, and where it could break.
Robots often fail when something unexpected happens, such as a door jamming or an object falling. VAP-TAMP tries to fix this by letting the robot actively choose what to look at next using a vision-language model prompted by its own action knowledge. It then builds a scene graph to reason about both the high-level task and the low-level motions needed to recover. The system was tested on service tasks both in simulation and on a real mobile manipulator.
Core claim
we develop a planning and situation-handling framework, called VAP-TAMP, that enables robots to actively perceive and address unforeseen situations during plan execution. VAP-TAMP leverages action knowledge to strategically prompt vision-language models for active view selection and situation assessment, while constructing and reasoning over scene graphs for integrated task and motion planning.
Load-bearing premise
That prompting vision-language models with action knowledge will produce reliable situation assessments and that scene graphs can be constructed and reasoned over in real time without excessive error propagation during recovery.
read the original abstract
Current robots are capable of computing plans to accomplish complex tasks. However, real-world environments are inherently open and dynamic, and unforeseen situations frequently arise during plan execution, such as jamming doors and fallen objects on the floor. These situations may result from the robot's own action failures or from external disturbances, such as human activities. Detecting and handling such execution - time situations remains a significant challenge, limiting those robots' ability to achieve long-term autonomy. In this paper, we develop a planning and situation-handling framework, called VAP-TAMP, that enables robots to actively perceive and address unforeseen situations during plan execution. VAP-TAMP leverages action knowledge to strategically prompt vision-language models for active view selection and situation assessment, while constructing and reasoning over scene graphs for integrated task and motion planning. We evaluated VAP-TAMP using service tasks in simulation and on a mobile manipulation platform.
Editorial analysis
A structured set of objections, weighed in public.
Desk editor's note, referee report, simulated authors' rebuttal, and a
circularity audit. Tearing a paper down is the easy half of reading it; the
pith above is the substance, this is the friction.
Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The framework implicitly assumes reliable VLM outputs and tractable scene-graph construction, but these are not formalized.
pith-pipeline@v0.9.0 ·
5483 in / 1038 out tokens ·
46977 ms ·
2026-05-07T12:21:42.476133+00:00
· methodology
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.