AirGroundBench is a new diagnostic benchmark exposing that MLLMs handle basic spatial perception but struggle with cross-view alignment, transformation reasoning, and embodied navigation under heterogeneous air-ground views.
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.
SCOPE introduces an edge-deployable natural-language PTZ camera agent, a simulation benchmark, and evaluations showing that stronger small language models reduce hallucinations while perception remains the main bottleneck.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
citing papers explorer
-
Lost in Aggregation: A Multi-Scale Diagnostic Benchmark for LLM Spatial Navigation
A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.