Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
GUIDE decomposes GUI agent evaluation into trajectory segmentation, subtask diagnosis, and overall summary to deliver higher accuracy and structured error reports than holistic baselines.
Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.
citing papers explorer
-
How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
-
GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis
GUIDE decomposes GUI agent evaluation into trajectory segmentation, subtask diagnosis, and overall summary to deliver higher accuracy and structured error reports than holistic baselines.
-
Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game
Gated escalation and partitioned states enable more efficient multi-agent collaboration in Minecraft by making communication selective rather than automatic.