AeroVerse: UA V-agent benchmark suite for simulating, pre-training, finetuning, and evaluating aerospace embodied world models

· 2024 · arXiv 2408.15511

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 2

representative citing papers

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace

cs.AI · 2026-04-09 · unverdicted · novelty 7.0

Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.

Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

cs.RO · 2026-04-09 · unverdicted · novelty 4.0

This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.

A Universal Large Language Model -- Drone Command and Control Interface

cs.RO · 2026-01-21 · unverdicted · novelty 4.0

A universal LLM-to-drone interface is implemented via the Model Context Protocol (MCP) and Mavlink, demonstrated with real UAV flight control and simulated flights using live map data.

AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding

cs.RO · 2025-04-13 · unverdicted · novelty 4.0

AirVista-II integrates agent-based task identification and scheduling, multimodal perception, and scenario-tailored keyframe extraction to deliver high-quality zero-shot semantic understanding for embodied UAVs in dynamic environments.

citing papers explorer

Showing 4 of 4 citing papers.

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace cs.AI · 2026-04-09 · unverdicted · none · ref 57
Large multimodal models display emerging but limited spatial action capabilities in goal-oriented urban 3D navigation, remaining far from human-level performance with errors diverging rapidly after critical decision points.
Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models cs.RO · 2026-04-09 · unverdicted · none · ref 122
This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.
A Universal Large Language Model -- Drone Command and Control Interface cs.RO · 2026-01-21 · unverdicted · none · ref 28
A universal LLM-to-drone interface is implemented via the Model Context Protocol (MCP) and Mavlink, demonstrated with real UAV flight control and simulated flights using live map data.
AirVista-II: An Agentic System for Embodied UAVs Toward Dynamic Scene Semantic Understanding cs.RO · 2025-04-13 · unverdicted · none · ref 15
AirVista-II integrates agent-based task identification and scheduling, multimodal perception, and scenario-tailored keyframe extraction to deliver high-quality zero-shot semantic understanding for embodied UAVs in dynamic environments.

AeroVerse: UA V-agent benchmark suite for simulating, pre-training, finetuning, and evaluating aerospace embodied world models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer