AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Chengwei Yan; Di Wang; Gang Liu; Guo Yu; Jianfei Yang; Ke Li; Luyao Zhang; Nan Luo; Quan Wang; Xiao Gao

arxiv: 2606.12142 · v1 · pith:GNCIXA4Pnew · submitted 2026-06-10 · 💻 cs.RO · cs.CV

AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Ke Li , Jianfei Yang , Luyao Zhang , Guo Yu , Chengwei Yan , Yuan Ding , Di Wang , Nan Luo

show 3 more authors

Gang Liu Xiao Gao Quan Wang

This is my paper

Pith reviewed 2026-06-27 09:18 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords autonomous UAVsLLM agentsaerial roboticsopen-source frameworkclosed-loop controlnatural language missionsruntime validation

0 comments

The pith

AerialClaw lets an LLM direct UAV missions by parsing natural language, calling skills, and revising plans from runtime feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AerialClaw as an open-source framework that converts UAVs from command-following platforms into agents capable of interpreting natural-language missions. The system maintains context, selects from libraries of hard and soft aerial skills, receives perception and execution feedback, and updates decisions through repeated cycles inside a closed loop. This approach replaces manual assembly of perception, planning, and safety modules with a single reusable architecture that supports multiple simulators and real hardware. A reader would care because it promises to make inspection, search, and monitoring tasks programmable in ordinary language rather than custom pipelines for every new scenario.

Core claim

Given a natural-language mission, AerialClaw enables an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop through a modular brain-skill-runtime architecture that includes document-driven state, memory-driven reflection, and safety-oriented validation.

What carries the argument

The modular brain-skill-runtime architecture that separates LLM reasoning from hard executable skills, Markdown soft skills, document-driven agent state, and runtime validation adapters.

If this is right

UAV applications can accept varied natural-language instructions without developers rewriting pipelines for each new task.
New skills can be added as code modules or Markdown documents and become immediately available to the agent without altering the decision loop.
The same agent code runs unchanged across mock execution, PX4 SITL with Gazebo, and AirSim environments before physical deployment.
Runtime validation catches invalid or unsafe commands before they reach the flight controller, supporting safer operation in simulation and on hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could support missions whose goals evolve mid-flight when new sensor data contradicts the original plan.
Pluggable model backends would allow direct comparison of different LLMs on the same aerial task set to measure decision quality.
Staged scripts that move from mock to simulator to real vehicle could shorten the path from prototype to fielded system.

Load-bearing premise

The combination of document-driven agent state, memory-driven reflection, and safety-oriented runtime validation will enable the LLM to make effective iterative decisions without unsafe or ineffective behavior in real UAV operations.

What would settle it

A flight test in which the agent receives a mission that requires repeated adaptation yet still issues unsafe commands or fails to reach the goal despite active validation and feedback loops.

Figures

Figures reproduced from arXiv: 2606.12142 by Chengwei Yan, Di Wang, Gang Liu, Guo Yu, Jianfei Yang, Ke Li, Luyao Zhang, Nan Luo, Quan Wang, Xiao Gao, Yuan Ding.

**Figure 2.** Figure 2: AerialClaw exposes agent behavior through human [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: AerialClaw web console for monitoring an au [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

read the original abstract

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AerialClaw is a clean open-source toolkit that packages LLM control for UAVs around Markdown soft skills and document-driven state, but it offers no test results to show the closed loop actually works.

read the letter

The paper's core contribution is a named framework that wires an LLM into UAV operations through a brain-skill-runtime split. Hard skills handle atomic flight commands, soft skills are stored as Markdown documents for reusable strategies, agent state lives in documents, and adapters let the same brain run on mock, PX4 SITL, or AirSim. The design also includes memory reflection and runtime safety checks. That combination of choices is new enough to be worth noting even if LLM-robotics integrations already exist elsewhere.

The implementation details look practical. The authors supply example missions, simulation assets, a web console, pluggable model backends, and staged deployment scripts. Supporting both lightweight mocks and two different simulators lowers the barrier for someone who wants to experiment without buying hardware first. Releasing the code openly is the right move for a systems paper.

The main limitation is the complete absence of evaluation. The abstract and description claim the agent can maintain context, call skills, read feedback, and revise plans in a loop, yet no runs, success rates, failure modes, or safety incidents are reported. Without that data it is impossible to judge whether the safety validation or reflection steps actually prevent bad behavior in practice.

This work is for robotics groups that already run UAV simulations and want a concrete starting point for LLM agents rather than for readers seeking new theoretical results. The architecture is coherent on its own terms and the code artifacts make the claims checkable, so it clears the bar for peer review in a tools or systems track.

Referee Report

1 major / 2 minor

Summary. The manuscript presents AerialClaw, an open-source framework for LLM-driven autonomous aerial agents. It describes a modular brain-skill-runtime architecture that, given a natural-language mission, enables an LLM agent to interpret the task, maintain context via document-driven state and memory reflection, invoke hard and soft skills, observe perception/runtime feedback, apply safety validation, and iteratively adapt decisions in a closed loop. The framework includes platform-agnostic adapters, support for mock execution, PX4 SITL/Gazebo, and AirSim simulations, plus a web console, pluggable models, example missions, and deployment scripts.

Significance. If the described components integrate and operate as outlined, the framework would provide a valuable, reproducible open-source platform for UAV research. It addresses the manual integration burden in current UAV pipelines and could accelerate work on LLM-based decision-making for applications such as inspection, search-and-rescue, and environmental monitoring by supplying standardized skills, state management, and simulation support.

major comments (1)

[Abstract] Abstract and overall manuscript: the central claim that the framework 'allows an LLM-based agent to ... iteratively update its decisions in a closed loop' is presented without any reported experiments, success metrics, failure cases, or even qualitative demonstrations of end-to-end mission execution. This absence makes it impossible to assess whether the combination of document-driven state, memory reflection, and safety validation actually produces functional closed-loop behavior.

minor comments (2)

The distinction and interaction between 'hard skills' (atomic UAV operations) and 'Markdown-based soft skills' (reusable task strategies) would benefit from a concrete example or pseudocode snippet showing how an LLM selects and composes them.
Consider adding a short related-work subsection that positions AerialClaw against existing UAV autonomy frameworks (e.g., PX4, ROS2-based stacks) and LLM-agent tool-use systems to clarify the incremental contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation for major revision. The primary concern regarding the lack of empirical support for the closed-loop claim is valid and will be addressed in the revision.

read point-by-point responses

Referee: [Abstract] Abstract and overall manuscript: the central claim that the framework 'allows an LLM-based agent to ... iteratively update its decisions in a closed loop' is presented without any reported experiments, success metrics, failure cases, or even qualitative demonstrations of end-to-end mission execution. This absence makes it impossible to assess whether the combination of document-driven state, memory reflection, and safety validation actually produces functional closed-loop behavior.

Authors: We agree that the manuscript, as currently written, provides no experiments, metrics, failure cases, or qualitative demonstrations of end-to-end execution, making it impossible for readers to verify the functional closed-loop behavior. The paper is structured as a systems/framework description focused on architecture, implementation, and open-source release rather than an evaluation study. The claims describe the intended operation of the brain-skill-runtime design. In the revised version we will add a dedicated section that walks through qualitative execution traces of the provided example missions in the supported simulation environments (PX4 SITL/Gazebo and AirSim). These traces will illustrate how document-driven state, memory reflection, and safety validation are used by the LLM agent to detect issues and iteratively revise decisions within a single mission run. We will also add an explicit limitations paragraph stating that quantitative benchmarking and real-world hardware trials are left to future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a framework description for an open-source UAV agent architecture. It contains no equations, no fitted parameters, no predictions derived from data, and no derivation chain that could reduce to its inputs. The central claims describe modular components (brain-skill-runtime, document-driven state, memory reflection, safety validation) and their intended use in closed-loop LLM decision-making; these are presented as design choices enabling capabilities rather than as results proven by internal reduction or self-citation. No load-bearing self-citations, ansatzes, or uniqueness theorems appear. The derivation is self-contained as an engineering architecture outline.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical modeling present; the contribution is a software architecture description with no free parameters, axioms, or invented physical entities.

pith-pipeline@v0.9.1-grok · 5835 in / 1111 out tokens · 23783 ms · 2026-06-27T09:18:31.874255+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 1 canonical work pages

[1]

Ani Hsieh, George J

Fernando Cladera, Zachary Ravichandran, Jason Hughes, Varun Murali, Carlos Nieto-Granda, M. Ani Hsieh, George J. Pappas, Camillo J. Taylor, and Vijay Kumar. AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents MM ’26, Nov. 10–14, 2026, Rio de Janeiro, Brazil

2026
[2]

Air-Ground Collaboration for Language-Specified Missions in Unknown Environments.IEEE Transactions on Field Robotics(2025), 1–1

2025
[3]

Dongjie Huo, Haoyun Liu, Guoqing Liu, Dekang Qi, Zhiming Sun, Maoguo Gao, Jianxin He, Yandan Yang, Xinyuan Chang, Feng Xiong, et al. 2026. ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents.arXiv preprint arXiv:2604.10096(2026)

Pith/arXiv arXiv 2026
[4]

Nathan Koenig and Andrew Howard. 2004. Design and use paradigms for gazebo, an open-source multi-robot simulator. In2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat. No. 04CH37566), Vol. 3. Ieee, 2149–2154

2004
[5]

Anis Koubâa, Basit Qureshi, Mohamed-Foued Sriti, Azza Allouch, Yasir Javed, Mohammed Alajlan, Omar Cheikhrouhou, Mohamed Khalgui, and Eduardo Tovar
[6]

In2019 IEEE International Systems Conference (SysCon)

Micro Air Vehicle Link (MAVLink) in a Nutshell: A Survey. In2019 IEEE International Systems Conference (SysCon). IEEE, 1–8
[7]

Haokun Liu, Zhaoqi Ma, Yunong Li, Junichiro Sugihara, Yicheng Chen, Jinjie Li, and Moju Zhao. 2025. Hierarchical Language Models for Semantic Navi- gation and Manipulation in an Aerial-Ground Robotic System.arXiv preprint arXiv:2506.05020(2025)

arXiv 2025
[8]

Lorenz Meier, Dominik Honegger, and Marc Pollefeys. 2015. PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms. In2015 IEEE international conference on robotics and automation (ICRA). IEEE, 6235–6240

2015
[9]

OpenClaw Contributors. 2026. OpenClaw: Personal AI Assistant. https://github. com/openclaw/openclaw. Accessed: 2026-03-08

2026
[10]

Learning to

Zachary Ravichandran, Varun Murali, Mariliza Tzes, George J. Pappas, and Vijay Kumar. 2025. SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments. In2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13714–13721. doi:10.1109/ICRA55743.2025.11128238

work page doi:10.1109/icra55743.2025.11128238 2025
[11]

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2017. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. InField and service robotics: Results of the 11th international conference. Springer, 621–635

2017
[12]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Zican Dong, Yupeng Hou, Beichen Zhang, Yingqian Min, Junjie Zhang, Peiyu Liu, et al. 2026. A survey of large language models.Frontiers of Computer Science20, 12 (2026), 2012627

2026

[1] [1]

Ani Hsieh, George J

Fernando Cladera, Zachary Ravichandran, Jason Hughes, Varun Murali, Carlos Nieto-Granda, M. Ani Hsieh, George J. Pappas, Camillo J. Taylor, and Vijay Kumar. AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents MM ’26, Nov. 10–14, 2026, Rio de Janeiro, Brazil

2026

[2] [2]

Air-Ground Collaboration for Language-Specified Missions in Unknown Environments.IEEE Transactions on Field Robotics(2025), 1–1

2025

[3] [3]

Dongjie Huo, Haoyun Liu, Guoqing Liu, Dekang Qi, Zhiming Sun, Maoguo Gao, Jianxin He, Yandan Yang, Xinyuan Chang, Feng Xiong, et al. 2026. ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents.arXiv preprint arXiv:2604.10096(2026)

Pith/arXiv arXiv 2026

[4] [4]

Nathan Koenig and Andrew Howard. 2004. Design and use paradigms for gazebo, an open-source multi-robot simulator. In2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat. No. 04CH37566), Vol. 3. Ieee, 2149–2154

2004

[5] [5]

Anis Koubâa, Basit Qureshi, Mohamed-Foued Sriti, Azza Allouch, Yasir Javed, Mohammed Alajlan, Omar Cheikhrouhou, Mohamed Khalgui, and Eduardo Tovar

[6] [6]

In2019 IEEE International Systems Conference (SysCon)

Micro Air Vehicle Link (MAVLink) in a Nutshell: A Survey. In2019 IEEE International Systems Conference (SysCon). IEEE, 1–8

[7] [7]

Haokun Liu, Zhaoqi Ma, Yunong Li, Junichiro Sugihara, Yicheng Chen, Jinjie Li, and Moju Zhao. 2025. Hierarchical Language Models for Semantic Navi- gation and Manipulation in an Aerial-Ground Robotic System.arXiv preprint arXiv:2506.05020(2025)

arXiv 2025

[8] [8]

Lorenz Meier, Dominik Honegger, and Marc Pollefeys. 2015. PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms. In2015 IEEE international conference on robotics and automation (ICRA). IEEE, 6235–6240

2015

[9] [9]

OpenClaw Contributors. 2026. OpenClaw: Personal AI Assistant. https://github. com/openclaw/openclaw. Accessed: 2026-03-08

2026

[10] [10]

Learning to

Zachary Ravichandran, Varun Murali, Mariliza Tzes, George J. Pappas, and Vijay Kumar. 2025. SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments. In2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13714–13721. doi:10.1109/ICRA55743.2025.11128238

work page doi:10.1109/icra55743.2025.11128238 2025

[11] [11]

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2017. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. InField and service robotics: Results of the 11th international conference. Springer, 621–635

2017

[12] [12]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Zican Dong, Yupeng Hou, Beichen Zhang, Yingqian Min, Junjie Zhang, Peiyu Liu, et al. 2026. A survey of large language models.Frontiers of Computer Science20, 12 (2026), 2012627

2026