arXiv preprint arXiv:2410.09604 , year=

Chen Gao, Baining Zhao, Weichen Zhang, Jinzhu Mao, Jun Zhang, Zhiheng Zheng, Fanhang Man, Jianjie Fang, Zile Zhou, Jinqiang Cui, Xinlei Chen, Yong Li · 2024 · arXiv 2410.09604

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 2

representative citing papers

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

AirGroundBench is a new diagnostic benchmark exposing that MLLMs handle basic spatial perception but struggle with cross-view alignment, transformation reasoning, and embodied navigation under heterogeneous air-ground views.

SpatialUAV: Benchmarking Spatial Intelligence for Low-Altitude UAV Perception, Collaboration, and Motion

cs.CV · 2026-06-26 · accept · novelty 7.0 · 2 refs

SpatialUAV releases a new multi-task benchmark for low-altitude UAV spatial intelligence and demonstrates that existing VLMs exhibit clear weaknesses in cross-view association and geometric reasoning.

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

cs.AI · 2026-06-08 · unverdicted · novelty 7.0

SpatialWorld is a new multi-simulator benchmark showing top multimodal agents achieve under 18% success on interactive spatial tasks requiring active exploration and long-horizon planning.

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.

NormAct: A Benchmark for Hidden Social Norm Compliance in Embodied Planning

cs.AI · 2026-06-26 · unverdicted · novelty 6.0

NormAct shows MLLMs reach explicit goals in 67.3% of cases but comply with hidden norms in only 26.4%, with NormPerceptor raising task success from 24.2% to 46.7%.

Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

cs.RO · 2026-04-09 · unverdicted · novelty 4.0

This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

citing papers explorer

Showing 5 of 5 citing papers after filters.

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration cs.CV · 2026-06-26 · unverdicted · none · ref 13
AirGroundBench is a new diagnostic benchmark exposing that MLLMs handle basic spatial perception but struggle with cross-view alignment, transformation reasoning, and embodied navigation under heterogeneous air-ground views.
SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks cs.AI · 2026-06-08 · unverdicted · none · ref 22
SpatialWorld is a new multi-simulator benchmark showing top multimodal agents achieve under 18% success on interactive spatial tasks requiring active exploration and long-horizon planning.
How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace cs.AI · 2026-04-09 · unverdicted · none · ref 13
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
NormAct: A Benchmark for Hidden Social Norm Compliance in Embodied Planning cs.AI · 2026-06-26 · unverdicted · none · ref 35
NormAct shows MLLMs reach explicit goals in 67.3% of cases but comply with hidden norms in only 26.4%, with NormPerceptor raising task success from 24.2% to 46.7%.
Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models cs.RO · 2026-04-09 · unverdicted · none · ref 115
This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.

arXiv preprint arXiv:2410.09604 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer