RIPA: Sensory-Vector Prompt Injection Attacks on LLM-Controlled ROS 2 Robots
Pith reviewed 2026-06-30 09:26 UTC · model grok-4.3
The pith
Sensory inputs enable prompt injection attacks on LLM-controlled robots, with success varying by model rather than size.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that sensory-vector prompt injection attacks produce model-specific vulnerability profiles in LLM-controlled ROS 2 robots rather than monotonic scaling with parameter count. Llama-3.3-70B-Instruct-Turbo reaches 100 percent ASR on all variants while Llama-3-8B-Instruct-Lite and Qwen 2.5-7B-Instruct-Turbo reach 0 percent on direct-override injections, and the approximately 4B Gemma-3n-E4B matches the 70B profile. Three channels are defined, with LiDAR context poisoning achieving 100 percent ASR on DeepSeek-V4-Flash, and a proposed hybrid firewall shows a 10.2 percent bypass rate on 19 obfuscation payloads.
What carries the argument
The three sensory injection channels that embed prompts into visual OCR output, audio STT transcripts, and LiDAR obstacle context at the LLM system-prompt level.
If this is right
- Direct text overrides are unnecessary when sensory data can deliver effective injections.
- Model choice for robotic control must be evaluated against specific injection profiles rather than parameter count.
- Hybrid semantic firewalls require additional layers to address obfuscated sensory payloads.
- LiDAR context poisoning forms a high-success route for altering robot environment state representations.
Where Pith is reading between the lines
- Similar sensory-channel attacks could extend to other multimodal LLM systems that ingest physical sensor streams.
- Robot architectures may need separate sensor validation modules before data reaches the LLM prompt.
- Security evaluations for embodied LLMs should incorporate simulated physical sensor inputs beyond text-only tests.
Load-bearing premise
The 100 independent runs per variant and the direct embedding of sensory data into the LLM system prompt capture real attack feasibility without unstated preprocessing or filtering in the ROS 2 pipeline.
What would settle it
Repeated trials on the LiDAR channel for DeepSeek-V4-Flash that produce zero successful injections despite insertion of fabricated obstacle data would falsify the 100 percent ASR result.
Figures
read the original abstract
We present RIPA, the first systematic multi-channel empirical study of prompt injection attacks delivered through the sensory pipeline of a ROS 2-based LLM-controlled robotic system. Across 100 independent runs per injection variant on five LLMs spanning four model families and parameter scales from approximately 4B to approximately 284B (DeepSeek-V4-Flash, Llama-3-8B-Instruct-Lite, Llama-3.3-70B-Instruct-Turbo, Qwen 2.5-7B-Instruct-Turbo, Gemma-3n-E4B), we identify model-specific vulnerability profiles that do not follow a monotonic scaling trend: Llama-3.3-70B-Instruct-Turbo exhibits 100% attack success rate (ASR) across all injection variants, while Llama-3-8B-Instruct-Lite and Qwen 2.5-7B-Instruct-Turbo resist direct-override injection (0% ASR), and the smallest model evaluated (Gemma-3n-E4B, approximately 4B) matches the 70B model's vulnerability profile, indicating that robustness is model-specific rather than scale-dependent. We propose a hybrid semantic firewall that achieves 0% ASR against known injection patterns with no false positives on a preliminary benign set (0/20 commands) but exhibits a 10.2% trial-weighted bypass rate (58/570 trials; N equals 30 per payload across 19 obfuscation payloads) against adversarially obfuscated attacks, exposing a critical gap between rule-based and semantic defense layers. We further introduce three sensory injection channels: visual (Channel 1, via OCR), audio (Channel 2, via Whisper STT), and LiDAR sensor context poisoning (Channel 3). We show that Channel 3, which injects fabricated obstacle data into the robot environment-state representation at the LLM system-prompt level, achieves 100% ASR across all variants on DeepSeek-V4-Flash. We also contribute a firewall bypass taxonomy spanning 19 obfuscation payloads across five categories. All code, data, and results are publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces RIPA, the first systematic multi-channel empirical study of prompt injection attacks on LLM-controlled ROS 2 robots delivered via sensory pipelines (visual OCR, audio via Whisper STT, and LiDAR sensor context poisoning). Across 100 independent runs per injection variant on five LLMs spanning ~4B to ~284B parameters, it reports model-specific ASR profiles that do not follow monotonic scaling: 100% ASR for Llama-3.3-70B-Instruct-Turbo and Gemma-3n-E4B across variants, versus 0% ASR for Llama-3-8B-Instruct-Lite and Qwen 2.5-7B-Instruct-Turbo on direct-override; Channel 3 achieves 100% ASR on DeepSeek-V4-Flash. It proposes a hybrid semantic firewall with 0% ASR on known patterns (0/20 benign) but 10.2% bypass rate (58/570 trials) on 19 obfuscated payloads, and contributes a firewall bypass taxonomy. All code, data, and results are publicly available.
Significance. If the empirical measurements hold, the work is significant for showing that LLM robustness to sensory prompt injection in robotic systems is model-specific rather than scale-dependent, with direct implications for model selection in embodied AI. The multi-channel attack vectors and the firewall bypass taxonomy provide actionable insights into defense gaps. The public release of code, data, and results is a clear strength, supporting reproducibility and allowing independent verification of the reported ASR values and run independence.
major comments (2)
- [Abstract] Abstract: The central claim that 'robustness is model-specific rather than scale-dependent' is based on the reported ASR differences (100% for Llama-3.3-70B-Instruct-Turbo and Gemma-3n-E4B vs. 0% for Llama-3-8B-Instruct-Lite and Qwen 2.5-7B-Instruct-Turbo on direct-override). However, sensory data integration is described only as occurring 'at the LLM system-prompt level' without specifying whether a single fixed template is used for all models or whether any model-dependent preprocessing, tokenization, truncation, or filtering is applied in the ROS 2 pipeline before prompt assembly. This detail is load-bearing for attributing the non-monotonic profile to intrinsic model behavior rather than pipeline artifacts.
- [Abstract] Abstract (firewall evaluation): The hybrid semantic firewall is stated to achieve 0% ASR against known injection patterns with no false positives on a preliminary benign set (0/20 commands). The small benign-set size and lack of detail on how the 570 obfuscated trials were constructed (N=30 per payload across 19 payloads) make it difficult to assess whether the 10.2% bypass rate (58/570) generalizes or whether the 0% false-positive claim is robust.
minor comments (2)
- [Abstract] Abstract: 'N equals 30' should be formatted consistently as N=30; similarly, parameter counts are given as 'approximately 4B' and 'approximately 284B' without exact figures or citations to model cards.
- [Abstract] Abstract: The statement that 'All code, data, and results are publicly available' is a strength but would benefit from an explicit repository URL or DOI in the abstract for immediate accessibility.
Simulated Author's Rebuttal
We thank the referee for their detailed comments on the abstract and evaluation methodology. We address each major comment below and will revise the manuscript to incorporate clarifications where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'robustness is model-specific rather than scale-dependent' is based on the reported ASR differences (100% for Llama-3.3-70B-Instruct-Turbo and Gemma-3n-E4B vs. 0% for Llama-3-8B-Instruct-Lite and Qwen 2.5-7B-Instruct-Turbo on direct-override). However, sensory data integration is described only as occurring 'at the LLM system-prompt level' without specifying whether a single fixed template is used for all models or whether any model-dependent preprocessing, tokenization, truncation, or filtering is applied in the ROS 2 pipeline before prompt assembly. This detail is load-bearing for attributing the non-monotonic profile to intrinsic model behavior rather than pipeline artifacts.
Authors: The ROS 2 pipeline implements a single fixed system-prompt template for sensory data integration across all five evaluated models, with no model-dependent preprocessing, tokenization, truncation, or filtering steps. This uniform assembly process was designed to isolate differences in model behavior. We will revise the methods section (and abstract if space permits) to explicitly document the fixed template and confirm the absence of model-specific pipeline variations. revision: yes
-
Referee: [Abstract] Abstract (firewall evaluation): The hybrid semantic firewall is stated to achieve 0% ASR against known injection patterns with no false positives on a preliminary benign set (0/20 commands). The small benign-set size and lack of detail on how the 570 obfuscated trials were constructed (N=30 per payload across 19 payloads) make it difficult to assess whether the 10.2% bypass rate (58/570) generalizes or whether the 0% false-positive claim is robust.
Authors: The 570 trials comprise 19 obfuscation payloads (grouped into the five-category taxonomy presented in the paper), each run for 30 independent trials. The benign set of 20 commands is labeled preliminary in the manuscript. We will expand the firewall evaluation section to detail the payload construction process and taxonomy categories, and we will explicitly note the preliminary nature of the benign-set results as a limitation. No changes to the reported rates are required. revision: yes
Circularity Check
No circularity: purely empirical measurements with no derivations or self-referential reductions
full rationale
The paper reports an empirical study of prompt injection attacks on LLM-controlled robots via ROS 2 sensory channels. It measures attack success rates across models and injection variants through direct experimentation (100 runs per variant), proposes a firewall based on observed results, and contributes a taxonomy from the same experiments. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the provided text or abstract. The central claims (model-specific vulnerability profiles, firewall performance) rest on raw experimental outcomes rather than any reduction to prior results by construction. This matches the default expectation for non-circular empirical work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ignore Previous Prompt: Attack Techniques For Language Models
F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” inNeurIPS ML Safety Workshop, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2211.09527 10
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.09527 2022
-
[3]
Jailbreaking LLM-controlled robots,
A. Robey, Z. Ravichandran, V . Kumar, H. Hassani, and G. J. Pappas, “Jailbreaking LLM-controlled robots,” arXiv preprint, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2410.13691
-
[4]
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
H. Zhang, C. Zhu, X. Wang, Z. Zhou,et al., “BadRobot: Jailbreaking embodied LLMs in the physical world,” inProc. ICLR, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2407.20242
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.20242 2025
-
[5]
Prompt injection attack against LLM-integrated mobile robotic systems,
W. Zhang, X. Kong, C. Dewitt, T. Braunl, and J. B. Hong, “Prompt injection attack against LLM-integrated mobile robotic systems,” inProc. IEEE 35th Int. Symp. Softw. Reliability Eng. Workshops (ISSREW), 2024. [Online]. Available: https://doi.org/10.1109/ISSREW63542.2024.00103
-
[8]
From prompt to physical action: Structured backdoor attacks on LLM-mediated robotic control systems,
M. Xie and J. Wei-Kocsis, “From prompt to physical action: Structured backdoor attacks on LLM-mediated robotic control systems,” arXiv preprint, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2604. 03890
-
[9]
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and M. Vechev, “AgentDojo: A dynamic environment for evaluating prompt injection attacks and defenses for LLM agents,” inProc. NeurIPS (Datasets and Benchmarks Track), 2024. [Online]. Available: https://doi. org/10.52202/079017-2636
-
[10]
Training language models to follow instructions with human feedback,
L. Ouyang, J. Wu, X. Jiang, D. Almeida,et al., “Training language models to follow instructions with human feedback,” inProc. NeurIPS,
-
[11]
Available: https://doi.org/10.52202/068431-2011
[Online]. Available: https://doi.org/10.52202/068431-2011
-
[12]
Robot Operating System 2: Design, architecture, and uses in the wild,
S. Macenski, T. Foote, B. Gerkey, C. Lalancette, and W. Woodall, “Robot Operating System 2: Design, architecture, and uses in the wild,”Science Robotics, vol. 7, no. 66, p. eabm6074, 2022. [Online]. Available: https: //doi.org/10.1126/scirobotics.abm6074
-
[13]
Robust Speech Recognition via Large-Scale Weak Supervision
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” inProc. ICML, 2023. [Online]. Available: https://doi.org/10.48550/arXiv. 2212.04356
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
-
[14]
Towards robust and secure embodied AI: A survey on vulnerabilities and attacks,
W. Xing, M. Li, M. Li, and M. Han, “Towards robust and secure embodied AI: A survey on vulnerabilities and attacks,”ACM Computing Surveys,
-
[15]
Available: https://doi.org/10.1145/3806048
[Online]. Available: https://doi.org/10.1145/3806048
-
[16]
Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise
Z. Huang, Z. Liu, M. Luo, W. Wu, and Z. Cai, “Propagating unsafe actions in LLM controlled multi-robot collaboration via single robot compromise,” arXiv preprint, 2026. [Online]. Available: https://doi.org/ 10.48550/arXiv.2605.15641
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.15641 2026
-
[17]
SafeEmbodAI: A safety framework for mobile robots in embodied AI systems,
W. Zhang, X. Kong, T. Braunl, and J. B. Hong, “SafeEmbodAI: A safety framework for mobile robots in embodied AI systems,” arXiv preprint,
-
[18]
Available: https://doi.org/10.48550/arXiv.2409.01630
[Online]. Available: https://doi.org/10.48550/arXiv.2409.01630
-
[19]
Prompt Injection Attack to Tool Selection in LLM Agents
J. Shi, Z. Yuan, G. Tie, P. Zhou, N. Z. Gong, and L. Sun, “Prompt injection attack to tool selection in LLM agents,” inProc. NDSS, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2504.19793 11 Appendix A Technology Stack Component Version / Details OS Ubuntu 24.04 (WSL2 on Windows 11) ROS ROS 2 Jazzy Simulator Gazebo Harmonic Robot TurtleBot3 Wa...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.19793 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.