pith. sign in

arxiv: 2506.07509 · v1 · pith:3DTSZ7UHnew · submitted 2025-06-09 · 💻 cs.RO

Taking Flight with Dialogue: Enabling Natural Language Control for PX4-based Drone Agent

classification 💻 cs.RO
keywords controllanguageaerialagenticfamiliesflightmodelmodels
0
0 comments X
read the original abstract

Recent advances in agentic and physical artificial intelligence (AI) have largely focused on ground-based platforms such as humanoid and wheeled robots, leaving aerial robots relatively underexplored. Meanwhile, state-of-the-art unmanned aerial vehicle (UAV) multimodal vision-language systems typically rely on closed-source models accessible only to well-resourced organizations. To democratize natural language control of autonomous drones, we present an open-source agentic framework that integrates PX4-based flight control, Robot Operating System 2 (ROS 2) middleware, and locally hosted models using Ollama. We evaluate performance both in simulation and on a custom quadcopter platform, benchmarking four large language model (LLM) families for command generation and three vision-language model (VLM) families for scene understanding.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

    cs.RO 2026-04 unverdicted novelty 4.0

    A survey of UAV vision-and-language navigation that establishes a methodological taxonomy, reviews resources and challenges, and proposes a forward-looking research roadmap.

  2. A Universal Large Language Model -- Drone Command and Control Interface

    cs.RO 2026-01 unverdicted novelty 4.0

    A universal LLM-to-drone interface is implemented via the Model Context Protocol (MCP) and Mavlink, demonstrated with real UAV flight control and simulated flights using live map data.