AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents
Pith reviewed 2026-06-27 09:18 UTC · model grok-4.3
The pith
AerialClaw lets an LLM direct UAV missions by parsing natural language, calling skills, and revising plans from runtime feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a natural-language mission, AerialClaw enables an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop through a modular brain-skill-runtime architecture that includes document-driven state, memory-driven reflection, and safety-oriented validation.
What carries the argument
The modular brain-skill-runtime architecture that separates LLM reasoning from hard executable skills, Markdown soft skills, document-driven agent state, and runtime validation adapters.
If this is right
- UAV applications can accept varied natural-language instructions without developers rewriting pipelines for each new task.
- New skills can be added as code modules or Markdown documents and become immediately available to the agent without altering the decision loop.
- The same agent code runs unchanged across mock execution, PX4 SITL with Gazebo, and AirSim environments before physical deployment.
- Runtime validation catches invalid or unsafe commands before they reach the flight controller, supporting safer operation in simulation and on hardware.
Where Pith is reading between the lines
- The framework could support missions whose goals evolve mid-flight when new sensor data contradicts the original plan.
- Pluggable model backends would allow direct comparison of different LLMs on the same aerial task set to measure decision quality.
- Staged scripts that move from mock to simulator to real vehicle could shorten the path from prototype to fielded system.
Load-bearing premise
The combination of document-driven agent state, memory-driven reflection, and safety-oriented runtime validation will enable the LLM to make effective iterative decisions without unsafe or ineffective behavior in real UAV operations.
What would settle it
A flight test in which the agent receives a mission that requires repeated adaptation yet still issues unsafe commands or fails to reach the goal despite active validation and feedback loops.
Figures
read the original abstract
Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents AerialClaw, an open-source framework for LLM-driven autonomous aerial agents. It describes a modular brain-skill-runtime architecture that, given a natural-language mission, enables an LLM agent to interpret the task, maintain context via document-driven state and memory reflection, invoke hard and soft skills, observe perception/runtime feedback, apply safety validation, and iteratively adapt decisions in a closed loop. The framework includes platform-agnostic adapters, support for mock execution, PX4 SITL/Gazebo, and AirSim simulations, plus a web console, pluggable models, example missions, and deployment scripts.
Significance. If the described components integrate and operate as outlined, the framework would provide a valuable, reproducible open-source platform for UAV research. It addresses the manual integration burden in current UAV pipelines and could accelerate work on LLM-based decision-making for applications such as inspection, search-and-rescue, and environmental monitoring by supplying standardized skills, state management, and simulation support.
major comments (1)
- [Abstract] Abstract and overall manuscript: the central claim that the framework 'allows an LLM-based agent to ... iteratively update its decisions in a closed loop' is presented without any reported experiments, success metrics, failure cases, or even qualitative demonstrations of end-to-end mission execution. This absence makes it impossible to assess whether the combination of document-driven state, memory reflection, and safety validation actually produces functional closed-loop behavior.
minor comments (2)
- The distinction and interaction between 'hard skills' (atomic UAV operations) and 'Markdown-based soft skills' (reusable task strategies) would benefit from a concrete example or pseudocode snippet showing how an LLM selects and composes them.
- Consider adding a short related-work subsection that positions AerialClaw against existing UAV autonomy frameworks (e.g., PX4, ROS2-based stacks) and LLM-agent tool-use systems to clarify the incremental contribution.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the recommendation for major revision. The primary concern regarding the lack of empirical support for the closed-loop claim is valid and will be addressed in the revision.
read point-by-point responses
-
Referee: [Abstract] Abstract and overall manuscript: the central claim that the framework 'allows an LLM-based agent to ... iteratively update its decisions in a closed loop' is presented without any reported experiments, success metrics, failure cases, or even qualitative demonstrations of end-to-end mission execution. This absence makes it impossible to assess whether the combination of document-driven state, memory reflection, and safety validation actually produces functional closed-loop behavior.
Authors: We agree that the manuscript, as currently written, provides no experiments, metrics, failure cases, or qualitative demonstrations of end-to-end execution, making it impossible for readers to verify the functional closed-loop behavior. The paper is structured as a systems/framework description focused on architecture, implementation, and open-source release rather than an evaluation study. The claims describe the intended operation of the brain-skill-runtime design. In the revised version we will add a dedicated section that walks through qualitative execution traces of the provided example missions in the supported simulation environments (PX4 SITL/Gazebo and AirSim). These traces will illustrate how document-driven state, memory reflection, and safety validation are used by the LLM agent to detect issues and iteratively revise decisions within a single mission run. We will also add an explicit limitations paragraph stating that quantitative benchmarking and real-world hardware trials are left to future work. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is a framework description for an open-source UAV agent architecture. It contains no equations, no fitted parameters, no predictions derived from data, and no derivation chain that could reduce to its inputs. The central claims describe modular components (brain-skill-runtime, document-driven state, memory reflection, safety validation) and their intended use in closed-loop LLM decision-making; these are presented as design choices enabling capabilities rather than as results proven by internal reduction or self-citation. No load-bearing self-citations, ansatzes, or uniqueness theorems appear. The derivation is self-contained as an engineering architecture outline.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ani Hsieh, George J
Fernando Cladera, Zachary Ravichandran, Jason Hughes, Varun Murali, Carlos Nieto-Granda, M. Ani Hsieh, George J. Pappas, Camillo J. Taylor, and Vijay Kumar. AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents MM ’26, Nov. 10–14, 2026, Rio de Janeiro, Brazil
2026
-
[2]
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments.IEEE Transactions on Field Robotics(2025), 1–1
2025
-
[3]
Dongjie Huo, Haoyun Liu, Guoqing Liu, Dekang Qi, Zhiming Sun, Maoguo Gao, Jianxin He, Yandan Yang, Xinyuan Chang, Feng Xiong, et al. 2026. ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents.arXiv preprint arXiv:2604.10096(2026)
Pith/arXiv arXiv 2026
-
[4]
Nathan Koenig and Andrew Howard. 2004. Design and use paradigms for gazebo, an open-source multi-robot simulator. In2004 IEEE/RSJ international conference on intelligent robots and systems (IROS)(IEEE Cat. No. 04CH37566), Vol. 3. Ieee, 2149–2154
2004
-
[5]
Anis Koubâa, Basit Qureshi, Mohamed-Foued Sriti, Azza Allouch, Yasir Javed, Mohammed Alajlan, Omar Cheikhrouhou, Mohamed Khalgui, and Eduardo Tovar
-
[6]
In2019 IEEE International Systems Conference (SysCon)
Micro Air Vehicle Link (MAVLink) in a Nutshell: A Survey. In2019 IEEE International Systems Conference (SysCon). IEEE, 1–8
-
[7]
Haokun Liu, Zhaoqi Ma, Yunong Li, Junichiro Sugihara, Yicheng Chen, Jinjie Li, and Moju Zhao. 2025. Hierarchical Language Models for Semantic Navi- gation and Manipulation in an Aerial-Ground Robotic System.arXiv preprint arXiv:2506.05020(2025)
arXiv 2025
-
[8]
Lorenz Meier, Dominik Honegger, and Marc Pollefeys. 2015. PX4: A node-based multithreaded open source robotics framework for deeply embedded platforms. In2015 IEEE international conference on robotics and automation (ICRA). IEEE, 6235–6240
2015
-
[9]
OpenClaw Contributors. 2026. OpenClaw: Personal AI Assistant. https://github. com/openclaw/openclaw. Accessed: 2026-03-08
2026
-
[10]
Zachary Ravichandran, Varun Murali, Mariliza Tzes, George J. Pappas, and Vijay Kumar. 2025. SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments. In2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13714–13721. doi:10.1109/ICRA55743.2025.11128238
-
[11]
Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2017. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. InField and service robotics: Results of the 11th international conference. Springer, 621–635
2017
-
[12]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Zican Dong, Yupeng Hou, Beichen Zhang, Yingqian Min, Junjie Zhang, Peiyu Liu, et al. 2026. A survey of large language models.Frontiers of Computer Science20, 12 (2026), 2012627
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.