Generative AI for Safe and Photorealistic Drone Light Shows

Alexander Gr\"afe; Pascal Reinhold; Sebastian Trimpe

arxiv: 2606.25458 · v1 · pith:WFOKJTS5new · submitted 2026-06-24 · 💻 cs.RO

Generative AI for Safe and Photorealistic Drone Light Shows

Pascal Reinhold , Alexander Gr\"afe , Sebastian Trimpe This is my paper

Pith reviewed 2026-06-25 21:31 UTC · model grok-4.3

classification 💻 cs.RO

keywords drone light showsgenerative AIswarm roboticsadaptive point trackingtext-to-videotrajectory planningcollision avoidancephotorealistic animation

0 comments

The pith

SWAN converts text prompts into photorealistic collision-free drone trajectories via video generation and adaptive tracking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SWAN as an end-to-end system that turns text descriptions into drone light show choreographies without manual animation. It generates reference videos from the text, then applies a new tracking method to extract motion patterns that become drone paths while preserving visual coherence. A planner assigns those paths to individual drones and a safety filter removes collision risks. This matters because drone shows have been limited by labor-intensive design, and the approach runs at large scale on ordinary computers. Tests show it handling simulated swarms of 2000 drones and real flights with 49 quadcopters.

Core claim

SWAN is an end-to-end pipeline that synthesizes photorealistic, large-scale, and collision-free drone choreographies directly from text prompts. SWAN converts text into realistic reference videos and translates these pixel-space dynamics into physical swarm kinematics using a novel adaptive point-tracking algorithm. Unlike existing trackers, this method maintains spatial coherence through severe occlusions and rapid topological shifts. A dedicated planner then allocates these trajectories to individual drones, while a subsequent safety filter ensures collision-free execution. The system demonstrates scalability by safely orchestrating simulated 2000-drone formations and validates physical fe

What carries the argument

Adaptive point-tracking algorithm that maintains spatial coherence to translate pixel dynamics from generated videos into physical swarm kinematics despite occlusions and topological shifts.

If this is right

Drone light shows can be created directly from text prompts without manual keyframing or animation.
The pipeline scales to formations of 2000 drones while remaining collision-free in simulation.
Physical feasibility holds for dense swarms of 49 quadcopters in real-world tests.
All computation runs on standard consumer hardware without specialized equipment.
Multi-robot choreography design becomes automated and accessible through generative AI.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The video-to-kinematics translation step could extend to other robot teams where motion is first visualized in 2D.
Better text-to-video models would directly raise the visual quality of the generated drone patterns.
The safety filter combined with trajectory allocation might apply to coordinating ground robot swarms or mixed aerial-ground teams.
Lowering the design effort could enable drone displays for smaller events, education, or temporary installations.

Load-bearing premise

The adaptive point-tracking algorithm can maintain spatial coherence and accurately translate pixel-space dynamics from generated videos into physical swarm kinematics despite severe occlusions and rapid topological shifts.

What would settle it

A generated video containing many overlapping and crossing motions is fed through the full pipeline, after which the output trajectories are executed in a physics simulator to check for any collisions or loss of intended visual patterns.

read the original abstract

Drone light shows are redefining aerial entertainment, yet their widespread adoption is bottlenecked by labor-intensive, manual animation. While generative AI promises an automated alternative, current frameworks fail to provide photorealism with fluid, dynamic motion. To address this limitation, we introduce SWAN, an end-to-end pipeline that synthesizes photorealistic, large-scale, and collision-free drone choreographies directly from text prompts. SWAN converts text into realistic reference videos and translates these pixel-space dynamics into physical swarm kinematics using a novel, adaptive point-tracking algorithm. Unlike existing trackers, this method maintains spatial coherence through severe occlusions and rapid topological shifts. A dedicated planner then allocates these trajectories to individual drones, while a subsequent safety filter ensures collision-free execution. We demonstrate scalability by safely orchestrating simulated 2,000-drone formations and validate physical feasibility on a dense real-world swarm of 49 quadcopters, operating everything entirely on standard consumer hardware. Combined, this work demonstrates how generative AI can be leveraged to automate multi-robot choreography design, providing an accessible new framework for drone light shows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SWAN is a practical end-to-end text-to-drone pipeline with real hardware runs, but the tracker at its core has no supporting numbers or comparisons.

read the letter

The paper's main contribution is SWAN, a pipeline that takes a text prompt, generates a reference video, uses a custom adaptive point tracker to map pixel motion to drone trajectories, plans assignments, and applies a safety filter. They report running 2000 drones in simulation and 49 on real quadcopters, all on consumer hardware.

What works is the applied focus. Getting a full chain from language to safe physical flight and showing it on actual hardware is useful for the drone entertainment space, where choreography is still mostly manual. The real-world test with 49 drones gives some credibility that the system can close the loop.

The soft spot is the tracker. The abstract says it maintains coherence under severe occlusions and rapid topology changes, and the whole claim of photorealistic large-scale shows rests on that step working without drift or misassignment. There are no error metrics, no occlusion-specific tests, no ablation, and no head-to-head numbers against standard trackers on the failure modes they highlight. The stress-test note is accurate on this point.

The paper is aimed at robotics practitioners and drone-show engineers who want an automated starting point rather than theorists. A reader looking for a working system description and hardware validation will find something usable here.

It deserves peer review because it ships a complete pipeline with physical results, even though the evaluation section needs quantitative support for the central technical claim.

Referee Report

2 major / 1 minor

Summary. The paper introduces SWAN, an end-to-end pipeline that generates photorealistic drone light shows from text prompts by first creating reference videos via generative AI, then using a novel adaptive point-tracking algorithm to map pixel dynamics to physical swarm trajectories (claimed to handle severe occlusions and rapid topological shifts), followed by trajectory planning and a safety filter for collision avoidance. It reports successful demonstrations scaling to 2,000 drones in simulation and 49 quadcopters in the real world, all on consumer hardware.

Significance. If the central claims hold, the work could meaningfully advance automated multi-robot choreography design for entertainment applications by reducing reliance on manual animation. The reported scale (2000/49 drones) and end-to-end text-to-execution framing are potentially impactful for the robotics and graphics communities. However, the absence of any quantitative metrics, ablations, or error analysis in the provided manuscript text makes it impossible to evaluate whether the novel tracker or overall pipeline delivers on its robustness and photorealism promises.

major comments (2)

[Abstract] Abstract (method description): The central claim that the adaptive point-tracking algorithm 'maintains spatial coherence through severe occlusions and rapid topological shifts' is load-bearing for translating generated videos into coherent 3D swarm kinematics, yet the manuscript supplies no quantitative error metrics, occlusion-specific ablations, failure-case analysis, or comparisons against standard trackers on the cited failure modes.
[Abstract] Abstract (demonstrations): The scalability claims rest on 'safely orchestrating simulated 2,000-drone formations' and 'dense real-world swarm of 49 quadcopters,' but no success rates, collision counts, trajectory error statistics, or baseline comparisons are reported to substantiate collision-free execution or photorealism at these scales.

minor comments (1)

[Abstract] The abstract is concise but would benefit from one or two key quantitative highlights (e.g., tracking accuracy or collision rate) to allow readers to gauge the strength of the empirical support.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential impact of the SWAN pipeline. We agree that the current manuscript lacks the quantitative evaluations needed to fully substantiate the central claims. We will revise the manuscript to address these gaps while maintaining the focus on the end-to-end text-to-trajectory framework.

read point-by-point responses

Referee: [Abstract] Abstract (method description): The central claim that the adaptive point-tracking algorithm 'maintains spatial coherence through severe occlusions and rapid topological shifts' is load-bearing for translating generated videos into coherent 3D swarm kinematics, yet the manuscript supplies no quantitative error metrics, occlusion-specific ablations, failure-case analysis, or comparisons against standard trackers on the cited failure modes.

Authors: We agree that quantitative support is required to validate the adaptive point-tracking algorithm's performance under the described conditions. In the revised manuscript we will add tracking error metrics (e.g., average pixel displacement and 3D trajectory deviation), occlusion-specific ablations, a dedicated failure-case analysis, and direct comparisons against standard trackers such as KLT, SORT, and DeepSORT on sequences exhibiting severe occlusions and topological changes. These additions will be placed in a new experimental subsection. revision: yes
Referee: [Abstract] Abstract (demonstrations): The scalability claims rest on 'safely orchestrating simulated 2,000-drone formations' and 'dense real-world swarm of 49 quadcopters,' but no success rates, collision counts, trajectory error statistics, or baseline comparisons are reported to substantiate collision-free execution or photorealism at these scales.

Authors: We acknowledge that the scalability and safety claims would be strengthened by supporting statistics. The revised manuscript will report success rates, collision counts (zero in the presented runs), trajectory error statistics (position and velocity RMSE), and comparisons against baseline planners for both the 2,000-drone simulation and the 49-drone real-world experiments. Photorealism will be supported by additional perceptual metrics where feasible. revision: yes

Circularity Check

0 steps flagged

No circularity: sequential pipeline of independent modules

full rationale

The described SWAN derivation is a linear sequence of distinct engineering stages—text-to-video generation, adaptive point-tracking for kinematics translation, trajectory allocation by planner, and safety filtering—none of which are defined in terms of their own outputs or reduced to fitted parameters by construction. The abstract and reader's summary provide no equations, self-citations, or uniqueness theorems that would make any step tautological. The central claim therefore rests on the empirical performance of these components rather than any self-referential reduction, qualifying as self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the 'novel adaptive point-tracking algorithm' is described functionally but not decomposed into parameters or assumptions.

pith-pipeline@v0.9.1-grok · 5719 in / 1282 out tokens · 33438 ms · 2026-06-25T21:31:34.726174+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages

[1]

Spectacular Intel drone light show helps bring Tokyo 2020 to life

International Olympic Committee. Spectacular Intel drone light show helps bring Tokyo 2020 to life. URL https://www.olympics.com/ ioc/news/spectacular-intel-drone- light-show-helps-bring-tokyo-2020- to-life-1

2020
[2]

FIFA world cup 2022 | SKYMAGIC drone shows

SKYMAGIC. FIFA world cup 2022 | SKYMAGIC drone shows. URL https://skymagic.show/ project/fifa-world-cup-2022/

2022
[3]

Austin new year’s drone show, 2025

Sky Elements Drones. Austin new year’s drone show, 2025. URL https: //skyelementsdrones.com/austin-new- years. Accessed: 03.04.2026

2025
[4]

China’s spring festival drone shows light up sky across globe, 2026

Xinhua News Agency. China’s spring festival drone shows light up sky across globe, 2026. URL https://english.news.cn/20260215/ 3c92d0d9daa845a78f53e6b194482970/ c.html. Accessed: 03.04.2026

arXiv 2026
[5]

Drone shows for music festivals,

CyberDrone. Drone shows for music festivals,
[6]

Accessed: 03.04.2026

URL https://www.cyberdrone.com/ blog/drone-shows-for-music- festivals. Accessed: 03.04.2026

2026
[7]

Optimal way- point assignment for designing drone light show formations

Dharna Nar and Radhika Kotecha. Optimal way- point assignment for designing drone light show formations. Results in Control and Optimization , 9:100174, 2022

2022
[8]

Multi-view approach for drone light show.The Visual Computer, 39(11): 5797–5808, 2023

Kai-Chun Weng, Shu-Ting Lin, Chen-Chi Hu, Ru- Tai Soong, and Ming-Te Chi. Multi-view approach for drone light show.The Visual Computer, 39(11): 5797–5808, 2023

2023
[9]

There’s no business like drone business

James O’Malley . There’s no business like drone business. Engineering & Technology, 16(4):72–79, 2021

2021
[10]

On the problems of drone formation and light shows

Gene Eu Jan, Tingjun Lei, Chi-Chia Sun, Zong- Ying You, and Chaomin Luo. On the problems of drone formation and light shows. IEEE trans- actions on consumer electronics, 70(3):5259–5268, 2024

2024
[11]

Drone light show designer

Vimdrones. Drone light show designer. URL https://docs.vimdrones.com/designer/
[12]

Skybrush studio - drone show design solutions

CollMot Robotics. Skybrush studio - drone show design solutions. URL https://skybrush.io/ modules/studio/
[13]

Drone show software by SPH Engineering | software for drone light shows

SPH Engineering. Drone show software by SPH Engineering | software for drone light shows. URL https://www.droneshowsoftware.com/ drone-show-software
[14]

Clipswarm: Generating drone shows from text prompts with vision-language models

Pablo Pueyo, Eduardo Montijano, Ana C Murillo, and Mac Schwager. Clipswarm: Generating drone shows from text prompts with vision-language models. In 2024 IEEE /RSJ International Con- ference on Intelligent Robots and Systems (IROS) , pages 11917–11923. IEEE, 2024

2024
[15]

Gen-Swarms: Adapting deep generative models to swarms of drones

Carlos Plou, Pablo Pueyo, Ruben Martinez-Cantin, Mac Schwager, Ana C Murillo, and Eduardo Mon- tijano. Gen-Swarms: Adapting deep generative models to swarms of drones. In European Confer- ence on Computer Vision, pages 85–101. Springer, 2024

2024
[16]

FlockGPT: Guiding uav flocking with linguistic orchestration

Artem Lykov, Sausar Karaf, Mikhail Martynov, Va- lerii Serpiva, Aleksey Fedoseev, Mikhail Konenkov, and Dzmitry Tsetserukou. FlockGPT: Guiding uav flocking with linguistic orchestration. In 2024 IEEE International Symposium on Mixed and Aug- mented Reality Adjunct (ISMAR-Adjunct) , pages 485–488. IEEE, 2024

2024
[17]

Swarm-GPT: Combining large language mod- els with safe motion planning for robot choreog- raphy design

Aoran Jiao, Tanmay P Patel, Sanjmi Khurana, Anna-Mariya Korol, Lukas Brunke, Vivek K Adaja- nia, Utku Culha, Siqi Zhou, and Angela P Schoel- lig. Swarm-GPT: Combining large language mod- els with safe motion planning for robot choreog- raphy design. arXiv preprint arXiv:2312.01059 , 2023. Page 13 of 15 2026 Reinhold et al

arXiv 2023
[18]

SwarmGPT: Com- bining large language models with safe motion planning for drone swarm choreography

Martin Schuck, Dinushka Orrin Dahanagga- maarachchi, Ben Sprenger, Vedant Vyas, Siqi Zhou, and Angela P Schoellig. SwarmGPT: Com- bining large language models with safe motion planning for drone swarm choreography . IEEE Robotics and Automation Letters, 2025

2025
[19]

Learning transferable visual mod- els from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy , Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry , Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual mod- els from natural language supervision. In Interna- tional conference on machine learning, pages 8748–
[20]

Wan: Open and ad- vanced large-scale video generative models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and ad- vanced large-scale video generative models. arXiv preprint arXiv:2503.20314, 2025

Pith/arXiv arXiv 2025
[21]

axswarm: Swarm trajectory planning algorithm imple- mented in jax

Learning Systems and Robotics Lab. axswarm: Swarm trajectory planning algorithm imple- mented in jax. https://github.com/ learnsyslab/axswarm, 2024

2024
[22]

Cotracker: It is better to track together

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker: It is better to track together. In European conference on computer vision , pages 18–35. Springer, 2024

2024
[23]

Cotracker3: Simpler and better point tracking by pseudo-labelling real videos

Nikita Karaev, Yuri Makarov, Jianyuan Wang, Na- talia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker3: Simpler and better point tracking by pseudo-labelling real videos. In Pro- ceedings of the IEEE /CVF International Conference on Computer Vision, pages 6013–6022, 2025

2025
[24]

Rath, Yufei Hua, Ab- hisheK Goudar, SiQi Zhou, and Angela P

Martin Schuck, Marcel P . Rath, Yufei Hua, Ab- hisheK Goudar, SiQi Zhou, and Angela P . Schoel- lig. Crazyflow: An accurate, gpu-accelerated, dif- ferentiable drone simulator in jax, 2026. URL https://arxiv.org/abs/2606.01478

Pith/arXiv arXiv 2026
[25]

Lighthouse Positioning Sys- tem

Bitcraze AB. Lighthouse Positioning Sys- tem. https://www.bitcraze.io/ documentation/system/positioning/ ligthouse-positioning-system/, 2024. Accessed: 2024-05-14

2024
[26]

Crazyflie Python Library V2

Bitcraze AB. Crazyflie Python Library V2. https://github.com/bitcraze/ crazyflie-lib-python-v2 , 2024. Accessed: 2026-04-14

2024
[27]

Color LED Deck

Bitcraze AB. Color LED Deck. https: //www.bitcraze.io/products/color- led-deck/, 2024. Accessed: 2026-04-14

2024
[28]

Optuna: A next-generation hyperparameter optimization framework

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019

2019
[29]

Multiobjec- tive tree-structured parzen estimator

Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, Masahiro Nomura, and Masaki Onishi. Multiobjec- tive tree-structured parzen estimator. Journal of Artificial Intelligence Research, 73:1209–1250, 04
[30]

doi: 10.1613 /jair.1.13188
[31]

HY- Motion 1.0: Scaling flow matching models for text-to-motion generation

Tencent Hunyuan 3D Digital Human Team. HY- Motion 1.0: Scaling flow matching models for text-to-motion generation. arXiv preprint arXiv:2512.23464, 2025

arXiv 2025
[32]

HunyuanVideo 1.5 prompt handbook

Tencent Hunyuan. HunyuanVideo 1.5 prompt handbook. https:// github.com/Tencent-Hunyuan/ HunyuanVideo-1.5/blob/main/assets/ HunyuanVideo_1_5_Prompt_Handbook_EN.md,
[33]

Qwen3.5: Towards native multi- modal agents, February 2026

Qwen Team. Qwen3.5: Towards native multi- modal agents, February 2026. URL https:// qwen.ai/blog?id=qwen3.5

2026
[34]

Z-Image: An efficient image generation foundation model with single- stream diffusion transformer, 2025

Image Team, Huanqia Cai, Sihan Cao, Ruoyi Du, Peng Gao, Steven Hoi, Zhaohui Hou, Shijie Huang, Dengyang Jiang, Xin Jin, Liangchen Li, Zhen Li, Zhong-Yu Li, David Liu, Dongyang Liu, Junhan Shi, Qilong Wu, Feng Yu, Chi Zhang, Shifeng Zhang, and Shilin Zhou. Z-Image: An efficient image generation foundation model with single- stream diffusion transformer, 20...

Pith/arXiv arXiv 2025
[35]

SAM 3: Segment anything with con- cepts, 2026

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Bais- han Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Tri- antafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Lili...

Pith/arXiv arXiv 2026
[36]

Cen- troidal voronoi tessellations: Applications and al- gorithms

Qiang Du, Vance Faber, and Max Gunzburger. Cen- troidal voronoi tessellations: Applications and al- gorithms. SIAM Review , 41(4):637–676, 1999. doi: 10.1137/S0036144599352836. URL https: //doi.org/10.1137/S0036144599352836

work page doi:10.1137/s0036144599352836 1999
[37]

Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering

Steven H Strogatz. Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Chapman and Hall /CRC, 2024

2024
[38]

Reducibility among combinatorial problems

Richard Karp. Reducibility among combinatorial problems. volume 40, pages 85–103, 01 1972. ISBN 978-3-540-68274-5. doi: 10.1007 /978-3- 540-68279-0_8. Page 14 of 15 2026 Reinhold et al

1972
[39]

URL https://epubs.siam.org/doi/abs/10

Grace Wahba. SPLine models for observa- tional data . 1 1990. doi: 10.1137 / 1.9781611970128. URL https://doi.org/ 10.1137/1.9781611970128

work page doi:10.1137/1.9781611970128 1990
[40]

AMSwarm: An alternat- ing minimization approach for safe motion plan- ning of quadrotor swarms in cluttered environ- ments

Vivek K Adajania, Siqi Zhou, Arun Kumar Singh, and Angela P Schoellig. AMSwarm: An alternat- ing minimization approach for safe motion plan- ning of quadrotor swarms in cluttered environ- ments. In 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 1421–

2023
[41]

Minimum snap trajectory generation and control for quadrotors

Daniel Mellinger and Vijay Kumar. Minimum snap trajectory generation and control for quadrotors. pages 2520 – 2525, 06 2011. doi: 10.1109 / ICRA.2011.5980409. Page 15 of 15

arXiv 2011

[1] [1]

Spectacular Intel drone light show helps bring Tokyo 2020 to life

International Olympic Committee. Spectacular Intel drone light show helps bring Tokyo 2020 to life. URL https://www.olympics.com/ ioc/news/spectacular-intel-drone- light-show-helps-bring-tokyo-2020- to-life-1

2020

[2] [2]

FIFA world cup 2022 | SKYMAGIC drone shows

SKYMAGIC. FIFA world cup 2022 | SKYMAGIC drone shows. URL https://skymagic.show/ project/fifa-world-cup-2022/

2022

[3] [3]

Austin new year’s drone show, 2025

Sky Elements Drones. Austin new year’s drone show, 2025. URL https: //skyelementsdrones.com/austin-new- years. Accessed: 03.04.2026

2025

[4] [4]

China’s spring festival drone shows light up sky across globe, 2026

Xinhua News Agency. China’s spring festival drone shows light up sky across globe, 2026. URL https://english.news.cn/20260215/ 3c92d0d9daa845a78f53e6b194482970/ c.html. Accessed: 03.04.2026

arXiv 2026

[5] [5]

Drone shows for music festivals,

CyberDrone. Drone shows for music festivals,

[6] [6]

Accessed: 03.04.2026

URL https://www.cyberdrone.com/ blog/drone-shows-for-music- festivals. Accessed: 03.04.2026

2026

[7] [7]

Optimal way- point assignment for designing drone light show formations

Dharna Nar and Radhika Kotecha. Optimal way- point assignment for designing drone light show formations. Results in Control and Optimization , 9:100174, 2022

2022

[8] [8]

Multi-view approach for drone light show.The Visual Computer, 39(11): 5797–5808, 2023

Kai-Chun Weng, Shu-Ting Lin, Chen-Chi Hu, Ru- Tai Soong, and Ming-Te Chi. Multi-view approach for drone light show.The Visual Computer, 39(11): 5797–5808, 2023

2023

[9] [9]

There’s no business like drone business

James O’Malley . There’s no business like drone business. Engineering & Technology, 16(4):72–79, 2021

2021

[10] [10]

On the problems of drone formation and light shows

Gene Eu Jan, Tingjun Lei, Chi-Chia Sun, Zong- Ying You, and Chaomin Luo. On the problems of drone formation and light shows. IEEE trans- actions on consumer electronics, 70(3):5259–5268, 2024

2024

[11] [11]

Drone light show designer

Vimdrones. Drone light show designer. URL https://docs.vimdrones.com/designer/

[12] [12]

Skybrush studio - drone show design solutions

CollMot Robotics. Skybrush studio - drone show design solutions. URL https://skybrush.io/ modules/studio/

[13] [13]

Drone show software by SPH Engineering | software for drone light shows

SPH Engineering. Drone show software by SPH Engineering | software for drone light shows. URL https://www.droneshowsoftware.com/ drone-show-software

[14] [14]

Clipswarm: Generating drone shows from text prompts with vision-language models

Pablo Pueyo, Eduardo Montijano, Ana C Murillo, and Mac Schwager. Clipswarm: Generating drone shows from text prompts with vision-language models. In 2024 IEEE /RSJ International Con- ference on Intelligent Robots and Systems (IROS) , pages 11917–11923. IEEE, 2024

2024

[15] [15]

Gen-Swarms: Adapting deep generative models to swarms of drones

Carlos Plou, Pablo Pueyo, Ruben Martinez-Cantin, Mac Schwager, Ana C Murillo, and Eduardo Mon- tijano. Gen-Swarms: Adapting deep generative models to swarms of drones. In European Confer- ence on Computer Vision, pages 85–101. Springer, 2024

2024

[16] [16]

FlockGPT: Guiding uav flocking with linguistic orchestration

Artem Lykov, Sausar Karaf, Mikhail Martynov, Va- lerii Serpiva, Aleksey Fedoseev, Mikhail Konenkov, and Dzmitry Tsetserukou. FlockGPT: Guiding uav flocking with linguistic orchestration. In 2024 IEEE International Symposium on Mixed and Aug- mented Reality Adjunct (ISMAR-Adjunct) , pages 485–488. IEEE, 2024

2024

[17] [17]

Swarm-GPT: Combining large language mod- els with safe motion planning for robot choreog- raphy design

Aoran Jiao, Tanmay P Patel, Sanjmi Khurana, Anna-Mariya Korol, Lukas Brunke, Vivek K Adaja- nia, Utku Culha, Siqi Zhou, and Angela P Schoel- lig. Swarm-GPT: Combining large language mod- els with safe motion planning for robot choreog- raphy design. arXiv preprint arXiv:2312.01059 , 2023. Page 13 of 15 2026 Reinhold et al

arXiv 2023

[18] [18]

SwarmGPT: Com- bining large language models with safe motion planning for drone swarm choreography

Martin Schuck, Dinushka Orrin Dahanagga- maarachchi, Ben Sprenger, Vedant Vyas, Siqi Zhou, and Angela P Schoellig. SwarmGPT: Com- bining large language models with safe motion planning for drone swarm choreography . IEEE Robotics and Automation Letters, 2025

2025

[19] [19]

Learning transferable visual mod- els from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy , Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry , Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual mod- els from natural language supervision. In Interna- tional conference on machine learning, pages 8748–

[20] [20]

Wan: Open and ad- vanced large-scale video generative models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and ad- vanced large-scale video generative models. arXiv preprint arXiv:2503.20314, 2025

Pith/arXiv arXiv 2025

[21] [21]

axswarm: Swarm trajectory planning algorithm imple- mented in jax

Learning Systems and Robotics Lab. axswarm: Swarm trajectory planning algorithm imple- mented in jax. https://github.com/ learnsyslab/axswarm, 2024

2024

[22] [22]

Cotracker: It is better to track together

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker: It is better to track together. In European conference on computer vision , pages 18–35. Springer, 2024

2024

[23] [23]

Cotracker3: Simpler and better point tracking by pseudo-labelling real videos

Nikita Karaev, Yuri Makarov, Jianyuan Wang, Na- talia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker3: Simpler and better point tracking by pseudo-labelling real videos. In Pro- ceedings of the IEEE /CVF International Conference on Computer Vision, pages 6013–6022, 2025

2025

[24] [24]

Rath, Yufei Hua, Ab- hisheK Goudar, SiQi Zhou, and Angela P

Martin Schuck, Marcel P . Rath, Yufei Hua, Ab- hisheK Goudar, SiQi Zhou, and Angela P . Schoel- lig. Crazyflow: An accurate, gpu-accelerated, dif- ferentiable drone simulator in jax, 2026. URL https://arxiv.org/abs/2606.01478

Pith/arXiv arXiv 2026

[25] [25]

Lighthouse Positioning Sys- tem

Bitcraze AB. Lighthouse Positioning Sys- tem. https://www.bitcraze.io/ documentation/system/positioning/ ligthouse-positioning-system/, 2024. Accessed: 2024-05-14

2024

[26] [26]

Crazyflie Python Library V2

Bitcraze AB. Crazyflie Python Library V2. https://github.com/bitcraze/ crazyflie-lib-python-v2 , 2024. Accessed: 2026-04-14

2024

[27] [27]

Color LED Deck

Bitcraze AB. Color LED Deck. https: //www.bitcraze.io/products/color- led-deck/, 2024. Accessed: 2026-04-14

2024

[28] [28]

Optuna: A next-generation hyperparameter optimization framework

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019

2019

[29] [29]

Multiobjec- tive tree-structured parzen estimator

Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, Masahiro Nomura, and Masaki Onishi. Multiobjec- tive tree-structured parzen estimator. Journal of Artificial Intelligence Research, 73:1209–1250, 04

[30] [30]

doi: 10.1613 /jair.1.13188

[31] [31]

HY- Motion 1.0: Scaling flow matching models for text-to-motion generation

Tencent Hunyuan 3D Digital Human Team. HY- Motion 1.0: Scaling flow matching models for text-to-motion generation. arXiv preprint arXiv:2512.23464, 2025

arXiv 2025

[32] [32]

HunyuanVideo 1.5 prompt handbook

Tencent Hunyuan. HunyuanVideo 1.5 prompt handbook. https:// github.com/Tencent-Hunyuan/ HunyuanVideo-1.5/blob/main/assets/ HunyuanVideo_1_5_Prompt_Handbook_EN.md,

[33] [33]

Qwen3.5: Towards native multi- modal agents, February 2026

Qwen Team. Qwen3.5: Towards native multi- modal agents, February 2026. URL https:// qwen.ai/blog?id=qwen3.5

2026

[34] [34]

Z-Image: An efficient image generation foundation model with single- stream diffusion transformer, 2025

Image Team, Huanqia Cai, Sihan Cao, Ruoyi Du, Peng Gao, Steven Hoi, Zhaohui Hou, Shijie Huang, Dengyang Jiang, Xin Jin, Liangchen Li, Zhen Li, Zhong-Yu Li, David Liu, Dongyang Liu, Junhan Shi, Qilong Wu, Feng Yu, Chi Zhang, Shifeng Zhang, and Shilin Zhou. Z-Image: An efficient image generation foundation model with single- stream diffusion transformer, 20...

Pith/arXiv arXiv 2025

[35] [35]

SAM 3: Segment anything with con- cepts, 2026

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Bais- han Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Tri- antafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Lili...

Pith/arXiv arXiv 2026

[36] [36]

Cen- troidal voronoi tessellations: Applications and al- gorithms

Qiang Du, Vance Faber, and Max Gunzburger. Cen- troidal voronoi tessellations: Applications and al- gorithms. SIAM Review , 41(4):637–676, 1999. doi: 10.1137/S0036144599352836. URL https: //doi.org/10.1137/S0036144599352836

work page doi:10.1137/s0036144599352836 1999

[37] [37]

Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering

Steven H Strogatz. Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Chapman and Hall /CRC, 2024

2024

[38] [38]

Reducibility among combinatorial problems

Richard Karp. Reducibility among combinatorial problems. volume 40, pages 85–103, 01 1972. ISBN 978-3-540-68274-5. doi: 10.1007 /978-3- 540-68279-0_8. Page 14 of 15 2026 Reinhold et al

1972

[39] [39]

URL https://epubs.siam.org/doi/abs/10

Grace Wahba. SPLine models for observa- tional data . 1 1990. doi: 10.1137 / 1.9781611970128. URL https://doi.org/ 10.1137/1.9781611970128

work page doi:10.1137/1.9781611970128 1990

[40] [40]

AMSwarm: An alternat- ing minimization approach for safe motion plan- ning of quadrotor swarms in cluttered environ- ments

Vivek K Adajania, Siqi Zhou, Arun Kumar Singh, and Angela P Schoellig. AMSwarm: An alternat- ing minimization approach for safe motion plan- ning of quadrotor swarms in cluttered environ- ments. In 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 1421–

2023

[41] [41]

Minimum snap trajectory generation and control for quadrotors

Daniel Mellinger and Vijay Kumar. Minimum snap trajectory generation and control for quadrotors. pages 2520 – 2525, 06 2011. doi: 10.1109 / ICRA.2011.5980409. Page 15 of 15

arXiv 2011