pith. sign in

arxiv: 2606.12978 · v1 · pith:HQSNFHOZnew · submitted 2026-06-11 · 💻 cs.RO · cs.CV· cs.SY· eess.SY

Trajectory-Level Redirection Attacks on Vision-Language-Action Models

Pith reviewed 2026-06-27 06:31 UTC · model grok-4.3

classification 💻 cs.RO cs.CVcs.SYeess.SY
keywords vision-language-action modelstrajectory redirectionprompt attacksrobot manipulationadversarial promptsclosed-loop controlcommand preservation
0
0 comments X

The pith

Near-benign prompt perturbations can redirect VLA rollouts to attacker-specified targets under a prompt-only threat model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that vision-language-action policies, which control robots from repeated text prompts, admit a trajectory-level attack in which one altered prompt steers the entire closed-loop episode to a different physical outcome. The altered prompt must remain close to the benign instruction, omit any mention of the target task, and avoid correction language, yet still produce the attacker-chosen result across changing observations. The authors formalize this as command-preserving trajectory redirection and supply an on-policy search procedure that evaluates candidate prompts by executing full rollouts to locate perturbations meeting the constraints. Experiments confirm the attacks succeed in both simulation and on physical hardware, showing that instruction grounding in these models can be subverted at the level of final physical state without obvious textual signals.

Core claim

A single attacker-chosen prompt, fixed before the episode, can make a VLA policy's closed-loop behavior track an attacker-specified target task while satisfying the constraints of command-preserving trajectory redirection: the prompt stays near the benign instruction, omits target words and correction language, and all other policy and environment components remain unchanged. The prompt is found by an on-policy search that uses rollouts to identify perturbations whose executed trajectories match the target.

What carries the argument

Command-preserving trajectory redirection, a prompt-only threat model in which one fixed prompt redirects the full trajectory to a new target while staying close to the original instruction and omitting target words or corrections, located via on-policy prompt search that evaluates candidates through actual rollouts.

If this is right

  • Prompt perturbations achieve redirection to attacker-specified targets while obeying the closeness and omission constraints.
  • The on-policy rollout search locates perturbations whose closed-loop trajectories track the chosen target task.
  • The redirection occurs in both simulated and physical hardware settings.
  • Text that appears to preserve the intended command can still give control over the robot's final physical outcome.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Safety checks for VLA systems may need to evaluate candidate prompts by simulating their potential full trajectories rather than inspecting text alone.
  • The same redirection risk could appear in other closed-loop language-controlled processes where instructions are reused across steps.
  • Applying the search across wider task distributions would reveal whether the redirection capability scales or breaks on more complex behaviors.

Load-bearing premise

The assumption that an on-policy prompt search using rollouts can reliably discover perturbations that satisfy the command-preserving constraints while achieving the target trajectory without the search itself introducing artifacts that would not appear in a real deployment.

What would settle it

Execute the prompts discovered by the search method inside an actual deployed VLA system that never ran the search procedure, and check whether the redirection to the attacker target still occurs or whether no qualifying near-benign prompts exist for the tested tasks.

Figures

Figures reproduced from arXiv: 2606.12978 by Dilek Hakkani-T\"ur, Gokul Puthumanaillam, Hooshang Nayyeri, Melkior Ornik, Pranay Thangeda, Vardhan Dongre.

Figure 1
Figure 1. Figure 1: A command-preserving prompt perturbation redirects the closed-loop VLA trajectory toward an adversary-specified physical goal. Although the adversarial prompt differs from the benign instruction by only a small text change and contains no explicit target command, the robot places the bowl on the plate rather than the stove. However, eliciting a targeted action at one inference step, or making that action p… view at source ↗
Figure 2
Figure 2. Figure 2: High-level overview of the on-policy teacher-matching prompt search. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Causal trace of a command-preserving perturbation [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Closed-loop target affinity on attack-induced states. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Perturbation budget versus search cost. We vary the maximum number of prompt tokens the search may modify. KF#4: Small perturbation budgets already suf￾fice, while larger budgets reduce search cost. Av￾eraging the reported edit column over the VLA rows in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Hardware validation. (a) Qualitative results from SmolVLA. (b) Performance of SO100-finetuned VLAs. queries to find a successful perturbation; with a larger budget, the same attack objective becomes easier to satisfy and the number of queries drop. KF#5: The attack survives real-robot deployment. The hardware results show that command￾preserving redirection is not only a simulator artifact. Under the benig… view at source ↗
Figure 7
Figure 7. Figure 7: LIBERO simulation. Unless otherwise stated, LIBERO actions are represented in a 7- D single-arm action space consisting of translation, rotation, and gripper components. For models whose native action representation is discrete action tokens, generated text, or a higher-dimensional padded action vector, actions are decoded using the corresponding model wrapper before environment execution. Teacher-matching… view at source ↗
Figure 8
Figure 8. Figure 8: Hardware setup. The real-robot experiments use a Standard Open Arm 100 (SO-100) tabletop manipulation setup. The SO-100 follower arm has five ac￾tuated arm degrees of freedom and one actuated gripper degree of freedom. We therefore refer to the hardware control interface as a six-channel joint-and-gripper interface: five arm joints plus one gripper command. The gripper is treated as part of the robot actio… view at source ↗
Figure 9
Figure 9. Figure 9: Additional simulation results 44 [PITH_FULL_IMAGE:figures/full_fig_p044_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Additional hardware results. All the benign prompts are evaluated on π 0.5 45 [PITH_FULL_IMAGE:figures/full_fig_p045_10.png] view at source ↗
read the original abstract

Vision-language-action (VLA) policies bring natural language into closed-loop robot control, enabling robots to execute manipulation tasks directly from text instructions. The same interface gives text a recurring role in control because the prompt is reused at every replanning step, and each prompt-conditioned action changes the future observations on which the policy acts. Existing VLA attacks study adversarial prompts that elicit targeted low-level actions or make such actions persist across changing images. We identify a stronger trajectory-level failure mode: a prompt that still $\textit{appears}$ to specify the intended task but redirects the final physical outcome. We mathematically formalize this setting as $\textit{command-preserving trajectory redirection}$, a prompt-only threat model in which the attacker chooses one prompt before the episode, all policy and environment components remain fixed, and the prompt must stay close to the benign instruction while omitting target words and correction language. To find such prompts, we introduce an on-policy prompt search method that uses rollouts to discover perturbations whose closed-loop behavior tracks a target task while satisfying the command-preserving constraints. Experiments in simulation and on hardware show that near-benign prompt perturbations can redirect VLA rollouts to attacker-specified targets. These results expose a trajectory-level vulnerability in VLA instruction grounding: text that appears to preserve the intended command can still give an adversary control over the robot's final physical outcome. Project website: https://vla-redirection-attack.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that near-benign prompt perturbations, found via an on-policy search over rollouts, can redirect closed-loop VLA trajectories to attacker-specified targets while satisfying a command-preserving threat model (prompt chosen once before the episode, remains close to the benign instruction, omits target words and correction language, and all other components fixed). Simulation and hardware experiments are said to support the existence of such prompts.

Significance. If the central claim holds under the stated threat model, the result would identify a trajectory-level vulnerability in VLA instruction grounding that goes beyond existing low-level action attacks, with potential implications for the security of language-conditioned robot policies.

major comments (1)
  1. [Abstract (threat model and method description)] The threat model (stated in the abstract) requires the attacker to select one fixed prompt before the episode begins, with all policy and environment components remaining fixed thereafter. However, the on-policy prompt search method relies on repeated full rollouts to discover and optimize the perturbation while enforcing the command-preserving constraints inside the same loop. This procedure implicitly grants the attacker repeated execution access during discovery, which appears inconsistent with the one-shot pre-episode choice and raises the possibility that reported successes exploit transient state sequences or observation statistics induced by the search itself rather than generalizing to a single fixed prompt chosen without further interaction.
minor comments (1)
  1. [Abstract] The abstract states that simulation and hardware experiments support the claim but provides no quantitative results, success rates, number of trials, or details on how redirection success and command preservation were measured.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying a potential ambiguity between the threat model and the prompt discovery procedure. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract (threat model and method description)] The threat model (stated in the abstract) requires the attacker to select one fixed prompt before the episode begins, with all policy and environment components remaining fixed thereafter. However, the on-policy prompt search method relies on repeated full rollouts to discover and optimize the perturbation while enforcing the command-preserving constraints inside the same loop. This procedure implicitly grants the attacker repeated execution access during discovery, which appears inconsistent with the one-shot pre-episode choice and raises the possibility that reported successes exploit transient state sequences or observation statistics induced by the search itself rather than generalizing to a single fixed prompt chosen without further interaction.

    Authors: We agree that the distinction between prompt discovery and deployment requires clearer separation to avoid ambiguity. The on-policy search is an offline optimization performed by the attacker (typically in simulation or with white-box access to the policy) to identify a prompt satisfying the command-preserving constraints; once identified, that single prompt is fixed and used for the entire episode with no further interaction or adaptation. The reported results evaluate the fixed prompt on independent rollouts (including hardware transfers), not on the search trajectories themselves. We will revise the abstract, threat model section, and method description to explicitly delineate the discovery phase (offline, attacker-controlled) from the attack phase (one-shot, fixed prompt, no repeated access), and we will add explicit statements that success is measured on held-out episodes after search termination. revision: partial

Circularity Check

0 steps flagged

No circularity detected in empirical attack paper

full rationale

The paper is an empirical demonstration of a prompt-only attack on VLA policies. It formalizes a threat model and introduces an on-policy search procedure whose success is shown via simulation and hardware experiments. No equations, fitted parameters, or derivations are presented that reduce a claimed result to its own inputs by construction. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the provided text. The central claim rests on experimental outcomes rather than any self-referential reduction, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a closed-loop VLA policy that reuses the same prompt at every replanning step and on the ability of an on-policy search to locate suitable perturbations; no free parameters, ad-hoc axioms, or new physical entities are introduced.

axioms (1)
  • domain assumption VLA policies condition actions on the same text prompt at every replanning step
    Stated in the abstract as the mechanism that gives text a recurring role in control.
invented entities (1)
  • command-preserving trajectory redirection threat model no independent evidence
    purpose: Formalizes the attack setting in which a single prompt redirects the final physical outcome while remaining close to the benign instruction
    Newly defined in the paper; no independent evidence outside the attack demonstration itself.

pith-pipeline@v0.9.1-grok · 5823 in / 1284 out tokens · 18188 ms · 2026-06-27T06:31:57.856865+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 17 linked inside Pith

  1. [2]

    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi, et al. Openvla: An open-source vision-language-action model.arXiv preprint arXiv:2406.09246, 2024

  2. [3]

    Zitkovich, T

    B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control. In Conference on Robot Learning, pages 2165–2183. PMLR, 2023

  3. [4]

    Black, N

    K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.pi 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

  4. [5]

    E. K. Jones, A. Robey, A. Zou, Z. Ravichandran, G. J. Pappas, H. Hassani, M. Fredrikson, and J. Z. Kolter. Adversarial attacks on robotic vision language action models.arXiv preprint arXiv:2506.03350, 2025

  5. [6]

    A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson. Universal and transfer- able adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023

  6. [8]

    S. Ross, G. Gordon, and D. Bagnell. A reduction of imitation learning and structured predic- tion to no-regret online learning. InProceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Pro- ceedings, 2011

  7. [9]

    Brohan, N

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Haus- man, A. Herzog, J. Hsu, et al. Rt-1: Robotics transformer for real-world control at scale.arXiv preprint arXiv:2212.06817, 2022

  8. [10]

    O’Neill, A

    A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, et al. Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903. IEEE, 2024

  9. [11]

    Driess, F

    D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu, et al. Palm-e: An embodied multimodal language model.arXiv preprint arXiv:2303.03378, 2023

  10. [12]

    J. Lee, J. Duan, H. Fang, Y . Deng, S. Liu, B. Li, B. Fang, J. Zhang, Y . R. Wang, S. Lee, et al. Molmoact: Action reasoning models that can reason in space.arXiv preprint arXiv:2508.07917, 2025. 9

  11. [13]

    Pertsch, K

    K. Pertsch, K. Stachowicz, B. Ichter, D. Driess, S. Nair, Q. Vuong, O. Mees, C. Finn, and S. Levine. FAST: Efficient action tokenization for vision-language-action models.arXiv preprint arXiv:2501.09747, 2025

  12. [14]

    Y . Wang, H. Zhu, M. Liu, J. Yang, H.-S. Fang, and T. He. VQ-VLA: Improving vision- language-action models via scaling vector-quantized action tokenizers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  13. [15]

    Bousmalis, G

    K. Bousmalis, G. Vezzani, D. Rao, C. Devin, A. X. Lee, M. Bauza, T. Davchev, Y . Zhou, A. Gupta, et al. RoboCat: A self-improving generalist agent for robotic manipulation.Trans- actions on Machine Learning Research (TMLR), 2023

  14. [16]

    M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645, 2025

  15. [17]

    Ghosh, H

    Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy.arXiv preprint arXiv:2405.12213, 2024

  16. [18]

    Black, N

    Physical Intelligence, K. Black, N. Brown, J. Darpinian, K. Dhabalia, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, et al.π 0.5: a vision-language-action model with open-world generalization.arXiv preprint arXiv:2504.16054, 2025

  17. [19]

    S. Liu, L. Wu, B. Li, H. Tan, H. Chen, Z. Wang, K. Xu, H. Su, and J. Zhu. RDT-1B: a diffusion foundation model for bimanual manipulation.arXiv preprint arXiv:2410.07864, 2024

  18. [20]

    Y . Li, Y . Deng, J. Zhang, J. Jang, M. Memmel, C. Garrett, F. Ramos, D. Fox, A. Li, A. Gupta, et al. Hamster: Hierarchical action models for open-world robot manipulation. InInternational Conference on Learning Representations, volume 2025, pages 24040–24068, 2025

  19. [21]

    D. Qu, H. Song, Q. Chen, Y . Yao, X. Ye, Y . Ding, Z. Wang, J. Gu, B. Zhao, D. Wang, et al. Spatialvla: Exploring spatial representations for visual-language-action model.arXiv preprint arXiv:2501.15830, 2025

  20. [22]

    Y . Ji, H. Tan, J. Shi, X. Hao, Y . Zhang, H. Zhang, P. Wang, M. Zhao, Y . Mu, P. An, et al. Robobrain: A unified brain model for robotic manipulation from abstract to concrete. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1724–1734, 2025

  21. [23]

    Shukor, D

    M. Shukor, D. Aubakirova, F. Capuano, P. Kooijmans, S. Palma, A. Zouitine, M. Aractingi, C. Pascal, M. Russi, A. Marafioti, et al. Smolvla: A vision-language-action model for afford- able and efficient robotics.arXiv preprint arXiv:2506.01844, 2025

  22. [24]

    Jiang, X

    T. Jiang, X. Jiang, Y . Ma, X. Wen, B. Li, K. Zhan, P. Jia, Y . Liu, S. Sun, and X. Lang. The better you learn, the smarter you prune: Towards efficient vision-language-action models via differentiable token pruning.arXiv preprint arXiv:2509.12594, 2025

  23. [25]

    J. Wen, Y . Zhu, J. Li, M. Zhu, Z. Tang, K. Wu, Z. Xu, N. Liu, R. Cheng, C. Shen, et al. Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation. IEEE Robotics and Automation Letters, 2025

  24. [26]

    Goyal, H

    A. Goyal, H. Hadfield, X. Yang, V . Blukis, and F. Ramos. Vla-0: Building state-of-the-art vlas with zero modification.arXiv preprint arXiv:2510.13054, 2025

  25. [27]

    D. Niu, Y . Sharma, G. Biamby, J. Quenum, Y . Bai, B. Shi, T. Darrell, and R. Herzig. Llarva: Vision-action instruction tuning enhances robot learning.arXiv preprint arXiv:2406.11815, 2024. 10

  26. [28]

    Jia and P

    R. Jia and P. Liang. Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031, Sept. 2017

  27. [29]

    Ebrahimi, A

    J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. HotFlip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, 2018

  28. [30]

    J. Gao, J. Lanchantin, M. L. Soffa, and Y . Qi. Black-box generation of adversarial text se- quences to evade deep learning classifiers. In2018 IEEE security and privacy workshops (SPW), pages 50–56. IEEE, 2018

  29. [31]

    J. Li, S. Ji, T. Du, B. Li, and T. Wang. Textbugger: Generating adversarial text against real- world applications.arXiv preprint arXiv:1812.05271, 2018

  30. [32]

    S. Ren, Y . Deng, K. He, and W. Che. Generating natural language adversarial examples through probability weighted word saliency. InProceedings of the 57th Annual Meeting of the Associ- ation for Computational Linguistics, pages 1085–1097, July 2019

  31. [33]

    Alzantot, Y

    M. Alzantot, Y . Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W. Chang. Generating natural language adversarial examples. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, 2018

  32. [34]

    D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment, 2020. URLhttps://arxiv.org/abs/ 1907.11932

  33. [35]

    L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu. BERT-ATTACK: Adversarial attack against BERT using BERT. InProceedings of the 2020 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP), pages 6193–6202, Nov. 2020

  34. [36]

    D. Li, Y . Zhang, H. Peng, L. Chen, C. Brockett, M.-T. Sun, and B. Dolan. Contextualized perturbation for textual adversarial attack. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pages 5053–5069, June 2021

  35. [37]

    Garg and G

    S. Garg and G. Ramakrishnan. BAE: BERT-based adversarial examples for text classification. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Process- ing (EMNLP), pages 6174–6181, Nov. 2020. URLhttps://aclanthology.org/2020. emnlp-main.498/

  36. [38]

    Boucher, I

    N. Boucher, I. Shumailov, R. Anderson, and N. Papernot. Bad characters: Imperceptible nlp attacks, 2021. URLhttps://arxiv.org/abs/2106.09898

  37. [39]

    Wallace, S

    E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing NLP. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, Nov. 2019

  38. [40]

    T. Shin, Y . Razeghi, R. L. L. IV , E. Wallace, and S. Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts, 2020. URLhttps://arxiv. org/abs/2010.15980

  39. [41]

    Schwinn, D

    L. Schwinn, D. Dobre, S. Xhonneux, G. Gidel, and S. G ¨unnemann. Soft prompt threats: Attacking safety alignment and unlearning in open-source llms through the embedding space. Advances in Neural Information Processing Systems, 37:9086–9116, 2024

  40. [42]

    X. Liu, N. Xu, M. Chen, and C. Xiao. Autodan: Generating stealthy jailbreak prompts on aligned large language models, 2024. URLhttps://arxiv.org/abs/2310.04451. 11

  41. [43]

    K. Zhu, J. Wang, J. Zhou, Z. Wang, H. Chen, Y . Wang, L. Yang, W. Ye, Y . Zhang, N. Z. Gong, and X. Xie. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts, 2024. URLhttps://arxiv.org/abs/2306.04528

  42. [45]

    H. Lu, Y . Yu, Y . Yang, C. Yi, Q. Zhang, B. Shen, A. C. Kot, and X. Jiang. When robots obey the patch: Universal transferable patch attacks on vision-language-action models, 2026. URL https://arxiv.org/abs/2511.21192

  43. [46]

    T. Wang, C. Han, J. C. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang. Exploring the adversarial vulnerabilities of vision-language-action models in robotics, 2025. URLhttps://arxiv.org/abs/2411.13587

  44. [47]

    X. Wang, J. Li, Z. Weng, Y . Wang, Y . Gao, T. Pang, C. Du, Y . Teng, Y . Wang, Z. Wu, X. Ma, and Y .-G. Jiang. Freezevla: Action-freezing attacks against vision-language-action models,

  45. [48]

    URLhttps://arxiv.org/abs/2509.19870

  46. [49]

    Zhang, W

    N. Zhang, W. Tao, X. Xiao, Q. Sun, Y . Zheng, W. Mo, P. Wang, and N. Zhang. Attention- guided patch-wise sparse adversarial attacks on vision-language-action models, 2025. URL https://arxiv.org/abs/2511.21663

  47. [50]

    Y . Yan, Y . Xie, Y . Zhang, L. Lyu, H. Wang, and Y . Jin. When alignment fails: Multimodal ad- versarial attacks on vision-language-action models, 2025. URLhttps://arxiv.org/abs/ 2511.16203

  48. [51]

    Q. Li, B. Yin, W. Huang, R. Liu, B. Zou, R. Yu, J. Ye, W. Yu, and X. Wang. Vision- language-action safety: Threats, challenges, evaluations, and mechanisms, 2026. URL https://arxiv.org/abs/2604.23775

  49. [52]

    Ebrahimi, A

    J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. Hotflip: White-box adversarial examples for text classification. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

  50. [53]

    J. Gao, J. Lanchantin, M. L. Soffa, and Y . Qi. Black-box generation of adversarial text se- quences to evade deep learning classifiers. InIEEE Security and Privacy Workshops (SPW), 2018

  51. [54]

    Wallace, S

    E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh. Universal adversarial triggers for attacking and analyzing nlp. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

  52. [55]

    B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone. Libero: Benchmarking knowl- edge transfer for lifelong robot learning.Advances in Neural Information Processing Systems, 36:44776–44791, 2023

  53. [56]

    Cadene, S

    R. Cadene, S. Aliberts, F. Capuano, M. Aractingi, A. Zouitine, P. Kooijmans, J. Choghari, M. Russi, C. Pascal, S. Palma, et al. Lerobot: An open-source library for end-to-end robot learning.arXiv preprint arXiv:2602.22818, 2026

  54. [57]

    bowl on plate

    NVIDIA, J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, et al. GR00T N1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. 12 Appendix Contents A Experimental Details and Evaluation Protocol 14 A.1 LIBERO Setup and Model Inference Details . . . . . . . . . . . . . . . ...