pith. sign in

arxiv: 2606.03556 · v1 · pith:EAL27CMYnew · submitted 2026-06-02 · 💻 cs.RO

Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics

Pith reviewed 2026-06-28 09:58 UTC · model grok-4.3

classification 💻 cs.RO
keywords adversarial patchesvision-language-action modelsroboticspartial observabilityadversarial attacksattention mapsaction trajectories
0
0 comments X

The pith

Adversarial patches generated from a short trajectory prefix can still cause long-horizon failures in vision-language-action robot models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that vision-language-action models remain vulnerable to fixed adversarial patches even when the attacker sees only an initial segment of the robot's path rather than the full trajectory. It introduces a two-phase method that first identifies key image regions using the model's attention maps tied to the task instruction, then tunes the patch to break object recognition and make planned action paths curve more sharply. If this holds, it shows that realistic partial-observability constraints do not protect these models, allowing sustained drops in task completion over extended robot operations in both simulated and physical settings.

Core claim

The paper claims that under a threat model limited to a short prefix of the trajectory, a single fixed patch can be produced via attention-map localization of instruction-critical regions followed by optimization that simultaneously disrupts target-object semantic grounding and raises action-trajectory curvature, thereby producing compounding perception and control errors that persist across subsequent frames and measurably lower task success rates in simulation and real-world robot trials.

What carries the argument

The two-phase framework that localizes the patch with attention maps corresponding to the full instruction and then optimizes it to break semantic grounding while increasing action curvature.

If this is right

  • Task success rates fall substantially when the fixed patch is applied throughout execution.
  • Disruptions compound over long horizons even though the patch never changes after the prefix.
  • Failures arise from simultaneous breaks in object perception and in the smoothness of generated actions.
  • The attack succeeds without requiring access to the complete future trajectory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Robot safety testing may need to include early-trajectory patch injection as a standard evaluation case.
  • Attention-map inspection during deployment could serve as an early-warning signal for ongoing attacks.
  • The same localization-plus-curvature approach might transfer to other multimodal control models that rely on visual-language grounding.

Load-bearing premise

The attacker is restricted to a short initial segment of the trajectory yet must produce one unchanging patch that affects every later frame.

What would settle it

A controlled trial in which patches generated from the short-prefix method produce no measurable drop in task success rates compared with clean runs, either in simulation or on physical robots, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.03556 by Keke Tang, Mingliang Han, Tianyu Hao, Xiaofei Wang, Yi Yang, Yun-Bo Zhao.

Figure 1
Figure 1. Figure 1: Unlike attacks in fully observable settings that assume access to the [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed attack framework under partial observability. In Stage 1, we perform attention-based patch localization to identify a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Snapshots of OpenVLA executions on four LIBERO tasks under different attack settings. The top row shows benign execution, the middle row shows [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Attack success rate (ASR, %) of our attack framework with random [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of loss weights on ASR and nASR. Left: varying λtraj with λsem=1.0; right: varying λsem with λtraj=12.0. b) Effect of semantic and trajectory losses: To assess the contribution of the two loss terms, we ablate each com￾ponent individually. Tab. IV reports ASR and nASR across LIBERO suites at K=10 and 30: removing either Lsem or Ltraj degrades performance, while the full model achieves the best resul… view at source ↗
Figure 6
Figure 6. Figure 6: Visual grounding maps of instruction nouns before and after our [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Histogram of angular deviations (degrees) at the first frame after [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Snapshots of real-world VLA task executions on a ROKAE xMate ER7 [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

Vision-language-action (VLA) models are gaining attention in robotics, yet their robustness to adversarial attacks remains largely unexplored. Existing work shows that adversarial patches can mislead VLA-based robots but assumes full access to the entire execution trajectory, an unrealistic requirement in practice. We address this limitation by formulating a partially observable threat model, where the adversary can exploit only a short prefix of the trajectory to generate a fixed patch applied to all subsequent frames. Under this setting, we propose a two-phase framework. First, we localize the patch using the model's attention maps to identify visually critical regions that correspond to the full instruction. Then, we optimize the patch to disrupt the semantic grounding of target objects and increase the curvature of action trajectories, thereby compounding failures in both perception and control. Extensive experiments in simulation and real-world robotic environments show that our method sustains adversarial effects under partial observability, inducing long-horizon disruptions and significantly reducing task success rates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that existing adversarial patch attacks on VLA models assume full trajectory access, which is unrealistic. It introduces a partially observable threat model limited to a short prefix of the trajectory for generating a fixed patch applied to subsequent frames. A two-phase framework is proposed: attention-map localization of the patch to critical regions, followed by optimization to disrupt semantic grounding of target objects and increase action-trajectory curvature to compound long-horizon failures. Extensive simulation and real-world experiments are said to show sustained adversarial effects and significantly reduced task success rates under this constraint.

Significance. If the central claim holds under the stated threat model, the work is significant for identifying practical vulnerabilities in emerging VLA-based robotic systems and for proposing a concrete attack construction that respects partial observability. The attention-based localization combined with curvature-driven optimization represents a technical contribution that could inform both attack and defense research in embodied AI.

major comments (2)
  1. [Abstract] Abstract and threat-model description: the optimization objective of increasing the curvature of action trajectories is a long-horizon property of the closed-loop action sequence. It is unclear how this loss can be computed or differentiated from only a short prefix without additional forward passes, unrolled simulation, or future-state queries. This directly affects whether the reported success under partial observability is consistent with the stated threat model.
  2. [Abstract] Abstract: the claim of 'significantly reducing task success rates' and 'extensive experiments' is asserted without any quantitative metrics, baselines, error bars, or statistical details. Because the central empirical claim cannot be evaluated from the provided text, the strength of evidence for the partial-observability result remains unevaluable.
minor comments (1)
  1. [Abstract] Abstract would be strengthened by inclusion of at least one key quantitative result (e.g., success-rate drop and comparison to full-observability baseline) to allow readers to gauge effect size immediately.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review. We appreciate the opportunity to clarify the consistency of our threat model and to strengthen the presentation of our empirical results. We address each major comment below and will make the necessary revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract and threat-model description: the optimization objective of increasing the curvature of action trajectories is a long-horizon property of the closed-loop action sequence. It is unclear how this loss can be computed or differentiated from only a short prefix without additional forward passes, unrolled simulation, or future-state queries. This directly affects whether the reported success under partial observability is consistent with the stated threat model.

    Authors: We thank the referee for highlighting this important point. In our two-phase framework, the curvature loss is approximated during optimization by performing a limited number of unrolled forward passes through the VLA model itself, using the model's predicted actions to simulate subsequent states within the known prefix context. This does not require access to real future states from the environment or additional trajectory data beyond the initial prefix, thereby preserving the partial observability constraint. We will revise the manuscript to explicitly describe this model-based unrolling procedure in the methods section to ensure clarity. revision: yes

  2. Referee: [Abstract] Abstract: the claim of 'significantly reducing task success rates' and 'extensive experiments' is asserted without any quantitative metrics, baselines, error bars, or statistical details. Because the central empirical claim cannot be evaluated from the provided text, the strength of evidence for the partial-observability result remains unevaluable.

    Authors: We agree that the abstract would benefit from including key quantitative results to better support our claims. The full paper contains detailed experimental results with metrics, baselines, and statistical analysis in Sections 4 and 5. In the revised manuscript, we will update the abstract to include specific quantitative findings, such as the reduction in task success rates along with mention of baselines and error bars. revision: yes

Circularity Check

0 steps flagged

No circularity: new threat model and two-phase framework presented as independent construction

full rationale

The paper formulates a partially observable threat model (short prefix only) and proposes a two-phase attack (attention-based localization then optimization for semantic disruption and trajectory curvature) without any equations, fitted parameters, or self-citations that reduce the central claims to their own inputs by construction. No derivation chain is claimed that loops back; the method is explicitly positioned as addressing a limitation of prior full-trajectory work. This matches the default expectation of a self-contained new construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, or new entities are described in the abstract; the contribution is an empirical attack method.

pith-pipeline@v0.9.1-grok · 5702 in / 908 out tokens · 18878 ms · 2026-06-28T09:58:48.823411+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    Openvla: An open- source vision-language-action model,

    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, R. Rafailov, E. Foster, G. Lam, P. Sanketi,et al., “Openvla: An open- source vision-language-action model,” inCoRL, 2024

  2. [2]

    π0: A vision-language-action flow model for general robot control,

    K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter,et al., “π0: A vision-language-action flow model for general robot control,” inRSS, 2025

  3. [3]

    GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

    J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang,et al., “Gr00t n1: An open foundation model for generalist humanoid robots,”arXiv preprint arXiv:2503.14734, 2025

  4. [4]

    Are we ready for service robots? the openloris- scene datasets for lifelong slam,

    X. Shi, D. Li, P. Zhao, Q. Tian, Y . Tian, Q. Long, C. Zhu, J. Song, F. Qiao, L. Song,et al., “Are we ready for service robots? the openloris- scene datasets for lifelong slam,” inICRA, 2020, pp. 3139–3145

  5. [5]

    A framework for end-user instruction of a robot assistant for manufacturing,

    K. R. Guerin, C. Lea, C. Paxton, and G. D. Hager, “A framework for end-user instruction of a robot assistant for manufacturing,” inICRA, 2015, pp. 6167–6174

  6. [6]

    Intriguing properties of neural networks,

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” inICLR, 2014

  7. [7]

    Explaining and harnessing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inICLR, 2015

  8. [8]

    Exploring the adversarial vulnerabilities of vision- language-action models in robotics,

    T. Wang, C. Han, J. C. Liang, W. Yang, D. Liu, L. X. Zhang, Q. Wang, J. Luo, and R. Tang, “Exploring the adversarial vulnerabilities of vision- language-action models in robotics,” inICCV, 2025

  9. [9]

    Robotics cyber security: Vulnerabilities, attacks, countermeasures, and recommen- dations,

    J.-P. A. Yaacoub, H. N. Noura, O. Salman, and A. Chehab, “Robotics cyber security: Vulnerabilities, attacks, countermeasures, and recommen- dations,”International Journal of Information Security, vol. 21, no. 1, pp. 115–158, 2022

  10. [10]

    De- tection of cyber-attacks to indoor real time localization systems for autonomous robots,

    ´A. M. Guerrero-Higueras, N. DeCastro-Garc ´ıa, and V . Matell´an, “De- tection of cyber-attacks to indoor real time localization systems for autonomous robots,”Robotics and Autonomous Systems, vol. 99, pp. 75–83, 2018

  11. [11]

    Security for the robot operating system,

    B. Dieber, B. Breiling, S. Taurer, S. Kacianka, S. Rass, and P. Schartner, “Security for the robot operating system,”Robotics and Autonomous Systems, vol. 98, pp. 192–203, 2017

  12. [12]

    Tros: Protecting humanoids ros from privileged attackers,

    G. Mazzeo and M. Staffa, “Tros: Protecting humanoids ros from privileged attackers,”International Journal of Social Robotics, vol. 12, no. 3, pp. 827–841, 2020

  13. [13]

    Characterizing physical adversarial attacks on robot motion planners,

    W. Wu, F. Pierazzi, Y . Du, and M. Brand ˜ao, “Characterizing physical adversarial attacks on robot motion planners,” inICRA, 2024, pp. 14 319–14 325

  14. [14]

    Physical and digital adversarial attacks on grasp quality networks,

    N. W. Alharthi and M. Brand ˜ao, “Physical and digital adversarial attacks on grasp quality networks,” inICRA, 2024, pp. 1907–1912

  15. [15]

    Adversarial grasp objects,

    D. Wang, D. Tseng, P. Li, Y . Jiang, M. Guo, M. Danielczuk, J. Mahler, J. Ichnowski, and K. Goldberg, “Adversarial grasp objects,” inCASE, 2019, pp. 241–248

  16. [16]

    Advgrasp: Adversarial attacks on robotic grasping from a physical perspective,

    X. Wang, M. Han, T. Hao, C. Li, Y . Zhao, and K. Tang, “Advgrasp: Adversarial attacks on robotic grasping from a physical perspective,” in IJCAI, 2025, pp. 547–555

  17. [17]

    Adversary is on the road: Attacks on visual SLAM using unnoticeable adversarial patch,

    B. Chen, W. Wang, P. Sikorski, and T. Zhu, “Adversary is on the road: Attacks on visual SLAM using unnoticeable adversarial patch,” inUSENIX Security, 2024, pp. 6345–6362

  18. [18]

    Adversarial attacks on robotic vision language action models,

    E. K. Jones, A. Robey, A. Zou, Z. Ravichandran, G. J. Pappas, H. Hassani, M. Fredrikson, and J. Z. Kolter, “Adversarial attacks on robotic vision language action models,” inRSS Workshop, 2025

  19. [19]

    Badrobot: Jailbreaking embodied llm agents in the physical world,

    H. Zhang, C. Zhu, X. Wang, Z. Zhou, C. Yin, M. Li, L. Xue, Y . Wang, S. Hu, A. Liu,et al., “Badrobot: Jailbreaking embodied llm agents in the physical world,” inICLR, 2025

  20. [20]

    Exploring the robustness of decision-level through adversarial attacks on llm-based embodied models,

    S. Liu, J. Chen, S. Ruan, H. Su, and Z. Yin, “Exploring the robustness of decision-level through adversarial attacks on llm-based embodied models,” inACM MM, 2024, pp. 8120–8128

  21. [21]

    Badvla: To- wards backdoor attacks on vision-language-action models via objective- decoupled optimization,

    X. Zhou, G. Tie, G. Zhang, H. Wang, P. Zhou, and L. Sun, “Badvla: To- wards backdoor attacks on vision-language-action models via objective- decoupled optimization,” inNeurIPS, 2025

  22. [22]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” inICLR, 2021

  23. [23]

    The coordination of arm movements: An exper- imentally confirmed mathematical model,

    T. Flash and N. Hogan, “The coordination of arm movements: An exper- imentally confirmed mathematical model,”The Journal of Neuroscience, vol. 5, no. 7, pp. 1688–1703, 1985

  24. [24]

    A technique for time-jerk optimal planning of robot trajectories,

    A. Gasparetto and V . Zanotto, “A technique for time-jerk optimal planning of robot trajectories,”Robotics and Computer-Integrated Man- ufacturing, vol. 24, no. 3, pp. 415–426, 2008

  25. [25]

    Libero: Benchmarking knowledge transfer for lifelong robot learning,

    B. Liu, Y . Zhu, C. Gao, Y . Feng, Q. Liu, Y . Zhu, and P. Stone, “Libero: Benchmarking knowledge transfer for lifelong robot learning,”NeurIPS, vol. 36, pp. 44 776–44 791, 2023

  26. [26]

    Hume: Introducing system-2 thinking in visual-language-action model,

    H. Song, D. Qu, Y . Yao, Q. Chen, X. Ye, Q. Lv, X. Gao, G. Ren, M. Yao, B. Zhao, D. Wang, and X. Li, “Hume: Introducing system-2 thinking in visual-language-action model,” inCVPR, 2026