CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms

Amjad Yousef Majid; Mohammed Majid

arxiv: 2605.07764 · v1 · submitted 2026-05-08 · 💻 cs.RO

CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms

Mohammed Majid , Amjad Yousef Majid This is my paper

Pith reviewed 2026-05-11 02:50 UTC · model grok-4.3

classification 💻 cs.RO

keywords natural language interfacesbehavior treesrobotic swarmsLoRA adaptationsafety filteringlarge language modelsswarm controlparser validation

0 comments

The pith

A safety pipeline around LoRA-adapted LLMs lifts valid behavior-tree generation for robot swarms from zero to 72 percent syntactic acceptance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how natural-language commands can be turned into executable behavior trees for groups of robots without producing unsafe or malformed outputs. It combines translation, safety filters, constrained prompts, and a parser that checks against a fixed list of allowed swarm actions. When Falcon3-Instruct-10B is adapted with LoRA on synthetic examples, zero-shot BLEU rises from 0.267 to 0.663 and parser-accepted trees jump from 0 percent to 72 percent. Few-shot prompting helps some models but the adaptation delivers the largest reliable gains. The work demonstrates that generation quality by itself is not enough and that explicit validation steps remain essential for practical use.

Core claim

CommandSwarm integrates multilingual translation, command-level safety filtering, constrained prompting, a LoRA-adapted 10B LLM, and deterministic parser validation to produce XML behavior trees from speech or text. On representative swarm scenarios the adaptation raises zero-shot BLEU from 0.267 to 0.663, ROUGE-L from 0.366 to 0.692, and parser-accepted syntactic validity from 0 percent to 72 percent while other models reach above 0.60 BLEU with few-shot prompts alone.

What carries the argument

The safety-aware language-to-behavior-tree pipeline that chains translation, safety filtering, constrained LLM prompting, and whitelist-based parser validation.

If this is right

Compact quantized LLMs can produce useful swarm behavior trees when placed inside a validated pipeline.
Parser acceptance and safety filtering stay necessary even after adaptation improves generation scores.
Few-shot prompting raises baseline quality for several models but adaptation yields stronger zero-shot results.
Multilingual front-end models such as SeamlessM4T v2-large balance quality and speed for non-English commands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Non-expert users could direct complex multi-robot tasks without programming if the synthetic data generalizes to live operations.
The same pipeline structure could be applied to other robot control languages beyond behavior trees.
Adding execution feedback loops might let the system learn new safe primitives over time without expanding the whitelist manually.

Load-bearing premise

The 2,063 synthetic instruction-BT examples and the fixed whitelist of swarm primitives represent the commands and safety limits that real users will actually need.

What would settle it

Run the full pipeline with non-expert operators giving varied spoken commands to physical robot swarms and measure whether any unsafe or unsupported behaviors are executed.

Figures

Figures reproduced from arXiv: 2605.07764 by Amjad Yousef Majid, Mohammed Majid.

**Figure 1.** Figure 1: CommandSwarm system overview. User speech or text is translated into English, filtered for safety, converted by an LLM into an XML BT, and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Distribution of behavior names in the synthetic instruction–BT [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Translation latency and quality. Top row: Whisper-medium versus SeamlessM4T v2-large for speech translation. Bottom row: EuroLLM-1.7B versus [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Stage-one comparison of eleven 4-bit quantized LLMs under zero-shot, one-shot, and two-shot prompting. Left: BLEU. Middle: ROUGE-L. Right: [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Stage-two evaluation of the strongest three LLMs on 50 held-out behavior-tree examples. Left: BLEU. Middle: ROUGE-L. Right: syntactic [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Prompt-engineered Falcon3 versus LoRA-adapted Falcon3-FT on 50 held-out examples. Left: BLEU. Middle: ROUGE-L. Right: syntactic correctness. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Natural-language interfaces can make swarm robotics more accessible to non-expert operators, but they must translate ambiguous user intent into executable swarm behaviors without unsupported actions, malformed programs, or unsafe plans. This paper presents CommandSwarm, a safety-aware language-to-behavior-tree pipeline for generating XML behavior trees (BTs) from speech or text commands. The system combines multilingual translation, command-level safety filtering, constrained prompting, a LoRA-adapted large language model (LLM), and deterministic parser validation against a whitelist of executable swarm primitives. We evaluate eleven open 6.7B--14B parameter LLMs, all using 4-bit quantization, on representative swarm-control scenarios under zero-shot, one-shot, and two-shot prompting. Falcon3-Instruct-10B and Mistral-7B-v3 are the strongest prompt-engineered candidates, reaching BLEU scores above 0.60 and high syntactic validity in few-shot settings. LoRA adaptation of Falcon3-Instruct-10B on a 2,063-example synthetic instruction--BT corpus improves zero-shot BLEU from 0.267 to 0.663, ROUGE-L from 0.366 to 0.692, and parser-accepted syntactic validity from 0% to 72%. Translation experiments further show that SeamlessM4T v2-large and EuroLLM-9B provide the best quality-latency trade-offs for the multilingual front end. The results indicate that compact, quantized, domain-adapted LLMs can generate useful swarm BTs when embedded in a validated systems pipeline. They also show that parser acceptance and safety filtering remain necessary execution gates; generation quality alone is not sufficient for autonomous deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CommandSwarm shows clear gains from LoRA adaptation on a synthetic BT corpus but measures everything inside that same synthetic distribution with no robot runs.

read the letter

The headline result is that LoRA fine-tuning of Falcon3-Instruct-10B on 2,063 synthetic instruction-BT pairs raises zero-shot BLEU from 0.267 to 0.663, ROUGE-L to 0.692, and parser-accepted validity to 72 percent. The pipeline adds command safety filtering and a deterministic parser against a fixed whitelist of swarm primitives before any tree is sent to the robots. That combination is the concrete engineering contribution here. They also benchmark eleven quantized 6.7B-14B models under zero-, one-, and two-shot prompting and identify Falcon3 and Mistral as the stronger base models before adaptation. The multilingual front-end comparison with SeamlessM4T and EuroLLM is a minor but useful side result for non-English commands. The work is honest about the need for the parser and safety gates; generation quality alone is treated as insufficient. The evaluation stays inside the synthetic corpus for both training and testing. No human-subject study checks whether real operators produce commands that match the corpus distribution, no out-of-distribution command set is tried, and no closed-loop robot execution or physical safety incidents are reported. Parser acceptance at 72 percent is an improvement over zero, yet it does not confirm that accepted trees match user intent or that the whitelist covers the behaviors operators actually need. Variance across random seeds or runs is not shown. This paper is aimed at applied robotics groups that want a working natural-language front end for swarm control and are willing to add their own real-world validation layer. A reader already building LLM-to-control pipelines will find the model comparisons and the safety-plus-parser design useful as a reference implementation. It deserves peer review because it delivers a measurable, end-to-end system rather than an untested idea, but any referee should require at least one set of real-robot trials and an out-of-distribution command test before acceptance.

Referee Report

1 major / 2 minor

Summary. The manuscript presents CommandSwarm, a safety-aware natural language to behavior tree (BT) generation pipeline for robotic swarms. It integrates multilingual translation, command-level safety filtering, constrained prompting, LoRA adaptation of LLMs on a synthetic 2,063-example corpus, and deterministic parser validation against a whitelist of primitives. Evaluations across eleven 6.7B-14B LLMs under zero-, one-, and two-shot prompting show Falcon3-Instruct-10B and Mistral-7B-v3 as strong baselines, with LoRA adaptation yielding substantial gains in BLEU (0.267 to 0.663), ROUGE-L (0.366 to 0.692), and parser-accepted validity (0% to 72%) for the adapted model.

Significance. If the synthetic corpus adequately represents real user intents and the system performs well in physical deployments, this could be a significant contribution to human-swarm interaction by enabling non-experts to command complex swarm behaviors safely. The strength lies in the end-to-end validated pipeline rather than generation alone, and the systematic comparison of multiple models and prompting methods offers valuable insights for the field. The use of open, quantized models also supports reproducibility and deployment on resource-constrained platforms.

major comments (1)

[Abstract and Results] The central quantitative claims rely on metrics computed against held-out synthetic instruction-BT pairs from the same distribution as the fine-tuning data. This setup does not address whether the generated BTs would be executable or safe in real robotic swarms, as no closed-loop experiments or human validation studies are reported.

minor comments (2)

[Abstract] No error bars, standard deviations, or details on the number of evaluation runs are provided for the BLEU, ROUGE-L, and validity rates.
The paper would benefit from including a few concrete examples of input commands and corresponding generated BTs in the main text or appendix to illustrate the output quality.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback and positive assessment of the pipeline's potential significance. We address the major comment below, clarifying the scope of our synthetic evaluation while acknowledging its limitations for real-world claims.

read point-by-point responses

Referee: [Abstract and Results] The central quantitative claims rely on metrics computed against held-out synthetic instruction-BT pairs from the same distribution as the fine-tuning data. This setup does not address whether the generated BTs would be executable or safe in real robotic swarms, as no closed-loop experiments or human validation studies are reported.

Authors: We agree that the reported metrics (BLEU, ROUGE-L, and parser-accepted validity) are computed on held-out synthetic data drawn from the same distribution as the 2,063-example LoRA training corpus, and that the manuscript contains no closed-loop robotic experiments or human validation studies. Our contribution focuses on the integrated pipeline—multilingual translation, command-level safety filtering, constrained prompting, LoRA adaptation, and deterministic parser validation against executable primitives—rather than end-to-end physical deployment. The abstract and results already note that “parser acceptance and safety filtering remain necessary execution gates; generation quality alone is not sufficient for autonomous deployment.” We have revised the abstract, results discussion, and conclusion to more explicitly frame the synthetic metrics as evidence of generation quality within the controlled domain, to state that real executability and safety require the downstream filters and parser, and to outline future physical validation as necessary next steps. revision: partial

standing simulated objections not resolved

We cannot add closed-loop experiments or human validation studies in the current revision, as these require physical swarm hardware, real-time execution environments, and user studies that are outside the scope and resources of this work.

Circularity Check

0 steps flagged

No significant circularity; empirical metrics measured directly on held-out synthetic data

full rationale

The paper reports standard machine-learning evaluation results: BLEU/ROUGE-L scores and parser validity percentages computed on held-out examples from the same 2,063-example synthetic corpus used for LoRA fine-tuning. These quantities are obtained via independent, off-the-shelf metrics and a deterministic whitelist parser; they are not algebraically or definitionally forced by the fine-tuning procedure itself. No equations appear in the provided text, no self-definitional loops exist, and no load-bearing self-citations or ansatz smuggling are invoked to justify the central claims. The evaluation pipeline (translation, safety filter, parser) supplies external checks that remain logically independent of the reported generation scores. This is a conventional empirical setup whose results stand or fall on the representativeness of the synthetic data rather than on any internal reduction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions about behavior-tree expressiveness for swarm tasks and the adequacy of a fixed primitive whitelist; no new physical entities or free parameters beyond conventional LLM training are introduced.

axioms (1)

domain assumption Behavior trees are a suitable formalism for representing safe, executable swarm behaviors from natural language.
Invoked in the pipeline design and parser validation step.

pith-pipeline@v0.9.0 · 5615 in / 1253 out tokens · 48198 ms · 2026-05-11T02:50:25.723753+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LoRA adaptation of Falcon3-Instruct-10B on a 2,063-example synthetic instruction-BT corpus improves zero-shot BLEU from 0.267 to 0.663
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

parser-accepted syntactic validity from 0% to 72%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

Swarm robotics: a review from the swarm engineering perspective,

M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo, “Swarm robotics: a review from the swarm engineering perspective,”Swarm Intelligence, vol. 7, no. 1, pp. 1–41, 2013

work page 2013
[2]

An introduction to swarm robotics,

I. Navarro and F. Matía, “An introduction to swarm robotics,”ISRN Robotics, vol. 2013, p. 608164, 2013

work page 2013
[3]

A comprehensive overview of large language models,

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,” 2024

work page 2024
[4]

A survey on large language models with some insights on their capabilities and limitations,

A. Matarazzo and R. Torlone, “A survey on large language models with some insights on their capabilities and limitations,” 2025

work page 2025
[5]

LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collab- orate through LLMs

V . Strobel, M. Dorigo, and M. Fritz, “LLM2Swarm: Robot swarms that responsively reason, plan, and collaborate through LLMs,” inNeurIPS 2024 Workshop on Open-World Agents, 2024. [Online]. Available: https://arxiv.org/abs/2410.11387

work page arXiv 2024
[6]

Large language models for multi- robot systems: A survey,

P. Li, Z. An, S. Abrar, and L. Zhou, “Large language models for multi- robot systems: A survey,” 2025

work page 2025
[7]

LLM-BRAIn: Ai-driven fast generation of robot behaviour tree based on large language model,

A. Lykov and D. Tsetserukou, “LLM-BRAIn: Ai-driven fast generation of robot behaviour tree based on large language model,” 2023

work page 2023
[8]

LLM-BT: Performing robotic adaptive tasks based on large language models and behavior trees,

H. Zhou, Y . Lin, L. Yan, J. Zhu, and H. Min, “LLM-BT: Performing robotic adaptive tasks based on large language models and behavior trees,” in2024 IEEE International Conference on Robotics and Au- tomation (ICRA), 2024

work page 2024
[9]

BTGenBot: Behavior tree generation for robotic tasks with lightweight LLMs,

R. A. Izzo, G. Bardaro, and M. Matteucci, “BTGenBot: Behavior tree generation for robotic tasks with lightweight LLMs,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 9684–9690

work page 2024
[10]

A review of swarm robotics in a nutshell,

M. M. Shahzad, Z. Saeed, A. Akhtar, H. Munawar, M. H. Yousaf, N. K. Baloach, and F. Hussain, “A review of swarm robotics in a nutshell,” Drones, vol. 7, no. 4, p. 269, 2023

work page 2023
[11]

Lightweight audio source localization for swarm robots,

A. Y . Majid, C. van der Horst, T. van Rietbergen, D. J. Zwart, and R. V . Prasad, “Lightweight audio source localization for swarm robots,” in2021 IEEE 18th Annual Consumer Communications & Networking Conference, 2021, pp. 1–6

work page 2021
[12]

From saying to doing: Natural language interaction with artificial agents and robots,

C. Kemke, “From saying to doing: Natural language interaction with artificial agents and robots,” inHuman Robot Interaction. IntechOpen, 2007, ch. 9

work page 2007
[13]

Ai-based simultaneous audio localization and com- munication for robots,

A. Y . Majid, C. van der Horst, L. de Groot, M. Jonker, R. V . Prasad, and S. Narayana, “Ai-based simultaneous audio localization and com- munication for robots,” inProceedings of the ACM/IEEE International Conference on Internet of Things Design and Implementation, 2023, pp. 172–183

work page 2023
[14]

Challenging con- ventions towards reliable robot navigation using deep reinforcement learning,

A. Y . Majid, T. van Rietbergen, and R. V . Prasad, “Challenging con- ventions towards reliable robot navigation using deep reinforcement learning,”Computing&AI Connect, vol. 1, no. 1, pp. 1–10, 2024

work page 2024
[15]

Deep reinforcement learning versus evolution strategies: A com- parative survey,

A. Y . Majid, S. Saaybi, V . François-Lavet, R. V . Prasad, and C. Verho- even, “Deep reinforcement learning versus evolution strategies: A com- parative survey,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 9, pp. 11 939–11 957, 2024

work page 2024
[16]

Colledanchise and P

M. Colledanchise and P. Ögren,Behavior Trees in Robotics and AI: An Introduction. CRC Press, 2018

work page 2018
[17]

A survey of behavior trees in robotics and ai,

M. Iovino, E. Scukins, J. Styrud, P. Ögren, and C. Smith, “A survey of behavior trees in robotics and ai,”Robotics and Autonomous Systems, vol. 154, p. 104096, 2022

work page 2022
[18]

BTGenBot-2: Efficient behavior tree generation with small language models,

R. A. Izzo, G. Bardaro, and M. Matteucci, “BTGenBot-2: Efficient behavior tree generation with small language models,” 2026

work page 2026
[19]

Robust speech recognition via large-scale weak super- vision,

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak super- vision,” 2022

work page 2022
[20]

SeamlessM4T: Massively multilingual and multimodal machine translation,

Seamless Communicationet al., “SeamlessM4T: Massively multilingual and multimodal machine translation,” 2023

work page 2023
[21]

EuroLLM-9B: Technical report,

P. H. Martins, J. Alves, P. Fernandes, N. M. Guerreiro, R. Rei, A. Fara- jian, M. Klimaszewski, D. M. Alves, J. Pombal, M. Faysse, P. Colombo, F. Yvon, B. Haddow, J. G. C. de Souza, A. Birch, and A. F. T. Martins, “EuroLLM-9B: Technical report,” 2025

work page 2025
[22]

Llama guard: Llm- based input-output safeguard for human-ai conversations,

H. Inan, K. Upasani, J. Chi, R. Rungta, K. Iyer, Y . Mao, M. Tontchev, Q. Hu, B. Fuller, D. Testuggine, and M. Khabsa, “Llama guard: Llm- based input-output safeguard for human-ai conversations,” 2023

work page 2023
[23]

Safety guardrails for llm-enabled robots,

Z. Ravichandran, A. Robey, V . Kumar, G. J. Pappas, and H. Hassani, “Safety guardrails for llm-enabled robots,” 2025

work page 2025
[24]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” 2021

work page 2021
[25]

Violet api documentation,

Violet Development Team, “Violet api documentation,” https://api.viol et.m-rots.com/vi, 2025

work page 2025
[26]

Code llama: Open foundation models for code,

B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, T. Remez, J. Rapinet al., “Code llama: Open foundation models for code,” 2024

work page 2024
[27]

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,

DeepSeek-AI, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” 2025

work page 2025
[28]

The falcon 3 family of open models,

Technology Innovation Institute, “The falcon 3 family of open models,”

work page
[29]

Available: https://huggingface.co/collections/tiiuae/falc on3-6766a04a1b7be3b5589a4a84

[Online]. Available: https://huggingface.co/collections/tiiuae/falc on3-6766a04a1b7be3b5589a4a84

work page
[30]

The llama 3 herd of models,

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,” 2024

work page 2024
[31]

Mistral 7b,

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. Singh Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnieret al., “Mistral 7b,” 2023

work page 2023
[32]

Qwen2.5-coder technical report,

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Luet al., “Qwen2.5-coder technical report,” 2024

work page 2024
[33]

Deepseek-coder: When the large language model meets programming – the rise of code intelligence,

D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . K. Liet al., “Deepseek-coder: When the large language model meets programming – the rise of code intelligence,” 2024

work page 2024
[34]

Phi- 4 technical report,

M. Abdin, J. Aneja, H. Behl, S. Bubeck, R. Eldan, S. Gunasekar, M. Harrison, R. J. Hewett, M. Javaheripi, P. Kauffmannet al., “Phi- 4 technical report,” 2024

work page 2024

[1] [1]

Swarm robotics: a review from the swarm engineering perspective,

M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo, “Swarm robotics: a review from the swarm engineering perspective,”Swarm Intelligence, vol. 7, no. 1, pp. 1–41, 2013

work page 2013

[2] [2]

An introduction to swarm robotics,

I. Navarro and F. Matía, “An introduction to swarm robotics,”ISRN Robotics, vol. 2013, p. 608164, 2013

work page 2013

[3] [3]

A comprehensive overview of large language models,

H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A comprehensive overview of large language models,” 2024

work page 2024

[4] [4]

A survey on large language models with some insights on their capabilities and limitations,

A. Matarazzo and R. Torlone, “A survey on large language models with some insights on their capabilities and limitations,” 2025

work page 2025

[5] [5]

LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collab- orate through LLMs

V . Strobel, M. Dorigo, and M. Fritz, “LLM2Swarm: Robot swarms that responsively reason, plan, and collaborate through LLMs,” inNeurIPS 2024 Workshop on Open-World Agents, 2024. [Online]. Available: https://arxiv.org/abs/2410.11387

work page arXiv 2024

[6] [6]

Large language models for multi- robot systems: A survey,

P. Li, Z. An, S. Abrar, and L. Zhou, “Large language models for multi- robot systems: A survey,” 2025

work page 2025

[7] [7]

LLM-BRAIn: Ai-driven fast generation of robot behaviour tree based on large language model,

A. Lykov and D. Tsetserukou, “LLM-BRAIn: Ai-driven fast generation of robot behaviour tree based on large language model,” 2023

work page 2023

[8] [8]

LLM-BT: Performing robotic adaptive tasks based on large language models and behavior trees,

H. Zhou, Y . Lin, L. Yan, J. Zhu, and H. Min, “LLM-BT: Performing robotic adaptive tasks based on large language models and behavior trees,” in2024 IEEE International Conference on Robotics and Au- tomation (ICRA), 2024

work page 2024

[9] [9]

BTGenBot: Behavior tree generation for robotic tasks with lightweight LLMs,

R. A. Izzo, G. Bardaro, and M. Matteucci, “BTGenBot: Behavior tree generation for robotic tasks with lightweight LLMs,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 9684–9690

work page 2024

[10] [10]

A review of swarm robotics in a nutshell,

M. M. Shahzad, Z. Saeed, A. Akhtar, H. Munawar, M. H. Yousaf, N. K. Baloach, and F. Hussain, “A review of swarm robotics in a nutshell,” Drones, vol. 7, no. 4, p. 269, 2023

work page 2023

[11] [11]

Lightweight audio source localization for swarm robots,

A. Y . Majid, C. van der Horst, T. van Rietbergen, D. J. Zwart, and R. V . Prasad, “Lightweight audio source localization for swarm robots,” in2021 IEEE 18th Annual Consumer Communications & Networking Conference, 2021, pp. 1–6

work page 2021

[12] [12]

From saying to doing: Natural language interaction with artificial agents and robots,

C. Kemke, “From saying to doing: Natural language interaction with artificial agents and robots,” inHuman Robot Interaction. IntechOpen, 2007, ch. 9

work page 2007

[13] [13]

Ai-based simultaneous audio localization and com- munication for robots,

A. Y . Majid, C. van der Horst, L. de Groot, M. Jonker, R. V . Prasad, and S. Narayana, “Ai-based simultaneous audio localization and com- munication for robots,” inProceedings of the ACM/IEEE International Conference on Internet of Things Design and Implementation, 2023, pp. 172–183

work page 2023

[14] [14]

Challenging con- ventions towards reliable robot navigation using deep reinforcement learning,

A. Y . Majid, T. van Rietbergen, and R. V . Prasad, “Challenging con- ventions towards reliable robot navigation using deep reinforcement learning,”Computing&AI Connect, vol. 1, no. 1, pp. 1–10, 2024

work page 2024

[15] [15]

Deep reinforcement learning versus evolution strategies: A com- parative survey,

A. Y . Majid, S. Saaybi, V . François-Lavet, R. V . Prasad, and C. Verho- even, “Deep reinforcement learning versus evolution strategies: A com- parative survey,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 9, pp. 11 939–11 957, 2024

work page 2024

[16] [16]

Colledanchise and P

M. Colledanchise and P. Ögren,Behavior Trees in Robotics and AI: An Introduction. CRC Press, 2018

work page 2018

[17] [17]

A survey of behavior trees in robotics and ai,

M. Iovino, E. Scukins, J. Styrud, P. Ögren, and C. Smith, “A survey of behavior trees in robotics and ai,”Robotics and Autonomous Systems, vol. 154, p. 104096, 2022

work page 2022

[18] [18]

BTGenBot-2: Efficient behavior tree generation with small language models,

R. A. Izzo, G. Bardaro, and M. Matteucci, “BTGenBot-2: Efficient behavior tree generation with small language models,” 2026

work page 2026

[19] [19]

Robust speech recognition via large-scale weak super- vision,

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak super- vision,” 2022

work page 2022

[20] [20]

SeamlessM4T: Massively multilingual and multimodal machine translation,

Seamless Communicationet al., “SeamlessM4T: Massively multilingual and multimodal machine translation,” 2023

work page 2023

[21] [21]

EuroLLM-9B: Technical report,

P. H. Martins, J. Alves, P. Fernandes, N. M. Guerreiro, R. Rei, A. Fara- jian, M. Klimaszewski, D. M. Alves, J. Pombal, M. Faysse, P. Colombo, F. Yvon, B. Haddow, J. G. C. de Souza, A. Birch, and A. F. T. Martins, “EuroLLM-9B: Technical report,” 2025

work page 2025

[22] [22]

Llama guard: Llm- based input-output safeguard for human-ai conversations,

H. Inan, K. Upasani, J. Chi, R. Rungta, K. Iyer, Y . Mao, M. Tontchev, Q. Hu, B. Fuller, D. Testuggine, and M. Khabsa, “Llama guard: Llm- based input-output safeguard for human-ai conversations,” 2023

work page 2023

[23] [23]

Safety guardrails for llm-enabled robots,

Z. Ravichandran, A. Robey, V . Kumar, G. J. Pappas, and H. Hassani, “Safety guardrails for llm-enabled robots,” 2025

work page 2025

[24] [24]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” 2021

work page 2021

[25] [25]

Violet api documentation,

Violet Development Team, “Violet api documentation,” https://api.viol et.m-rots.com/vi, 2025

work page 2025

[26] [26]

Code llama: Open foundation models for code,

B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, T. Remez, J. Rapinet al., “Code llama: Open foundation models for code,” 2024

work page 2024

[27] [27]

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,

DeepSeek-AI, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” 2025

work page 2025

[28] [28]

The falcon 3 family of open models,

Technology Innovation Institute, “The falcon 3 family of open models,”

work page

[29] [29]

Available: https://huggingface.co/collections/tiiuae/falc on3-6766a04a1b7be3b5589a4a84

[Online]. Available: https://huggingface.co/collections/tiiuae/falc on3-6766a04a1b7be3b5589a4a84

work page

[30] [30]

The llama 3 herd of models,

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughanet al., “The llama 3 herd of models,” 2024

work page 2024

[31] [31]

Mistral 7b,

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. Singh Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnieret al., “Mistral 7b,” 2023

work page 2023

[32] [32]

Qwen2.5-coder technical report,

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Luet al., “Qwen2.5-coder technical report,” 2024

work page 2024

[33] [33]

Deepseek-coder: When the large language model meets programming – the rise of code intelligence,

D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . K. Liet al., “Deepseek-coder: When the large language model meets programming – the rise of code intelligence,” 2024

work page 2024

[34] [34]

Phi- 4 technical report,

M. Abdin, J. Aneja, H. Behl, S. Bubeck, R. Eldan, S. Gunasekar, M. Harrison, R. J. Hewett, M. Javaheripi, P. Kauffmannet al., “Phi- 4 technical report,” 2024

work page 2024