TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

· 2026 · cs.CR · arXiv 2603.23117

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

By integrating Chain-of-Thought (CoT) reasoning, Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, particularly by improving generalization and interpretability. However, the security of CoT-based reasoning mechanisms remains largely unexplored. In this paper, we show that CoT reasoning introduces a novel attack vector for targeted behavior hijacking--for example, causing a robot to mistakenly deliver a knife to a person instead of an apple--without modifying the user's instruction. We first provide empirical evidence that CoT strongly governs action generation, even when it is semantically misaligned with the input instructions. Building on this observation, we propose TRAP, the first targeted behavior-hijacking adversarial attack against CoT-reasoning VLA models. By targeting the reasoning-to-action pathway, TRAP uses an adversarial patch (e.g., a tablecloth placed on the table) to steer intermediate CoT reasoning and downstream actions toward adversary-defined behaviors. Extensive evaluations on three representative reasoning VLAs, spanning distinct CoT reasoning mechanisms, demonstrate the effectiveness of TRAP. Notably, we implemented the patch by printing it on paper in a real-world setting. Our findings highlight the urgent need to secure CoT reasoning in VLA systems. The project page is available at https://zhengxian-huang.github.io/TRAP-website/.

representative citing papers

ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving

cs.CR · 2026-05-27 · unverdicted · novelty 6.0

ReasonBreak demonstrates up to 89% attack success on reasoning and 72% on trajectories in NVIDIA Alpamayo VLA models via black-box textual perturbations, introducing a reasoning-aware evaluation framework and benchmark for autonomous driving.

citing papers explorer

Showing 1 of 1 citing paper.

ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving cs.CR · 2026-05-27 · unverdicted · none · ref 19 · internal anchor
ReasonBreak demonstrates up to 89% attack success on reasoning and 72% on trajectories in NVIDIA Alpamayo VLA models via black-box textual perturbations, introducing a reasoning-aware evaluation framework and benchmark for autonomous driving.

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

fields

years

verdicts

representative citing papers

citing papers explorer