Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

· 2026 · cs.RO · arXiv 2604.23775

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Vision-Language-Action (VLA) models are emerging as a unified substrate for embodied intelligence. This shift raises a new class of safety challenges, stemming from the embodied nature of VLA systems, including irreversible physical consequences, a multimodal attack surface across vision, language, and state, real-time latency constraints on defense, error propagation over long-horizon trajectories, and vulnerabilities in the data supply chain. Yet the literature remains fragmented across robotic learning, adversarial machine learning, AI alignment, and autonomous systems safety. This survey provides a unified and up-to-date overview of safety in Vision-Language-Action models. We organize the field along two parallel timing axes, attack timing (training-time vs. inference-time and defense timing (training-time vs. inference-time, linking each class of threat to the stage at which it can be mitigated. We first define the scope of VLA safety, distinguishing it from text-only LLM safety and classical robotic safety, and review the foundations of VLA models, including architectures, training paradigms, and inference mechanisms. We then examine the literature through four lenses: Attacks, Defenses, Evaluation, and Deployment. We survey training-time threats such as data poisoning and backdoors, as well as inference-time attacks including adversarial patches, cross-modal perturbations, semantic jailbreaks, and freezing attacks. We review training-time and runtime defenses, analyze existing benchmarks and metrics, and discuss safety challenges across six deployment domains. Finally, we highlight key open problems, including certified robustness for embodied trajectories, physically realizable defenses, safety-aware training, unified runtime safety architectures, and standardized evaluation.

representative citing papers

SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs

cs.CV · 2026-06-18 · unverdicted · novelty 7.0

SPOT-E uses entropy shaping on answer predictions with low-entropy anchors to optimize visual spotlights at test time via GRPO for better VLM performance on evidence-intensive tasks.

Trajectory-Level Redirection Attacks on Vision-Language-Action Models

cs.RO · 2026-06-11 · unverdicted · novelty 7.0

A prompt-only attack called command-preserving trajectory redirection can steer VLA robot behavior to attacker-chosen physical outcomes while the text still appears to match the intended task.

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

cs.RO · 2026-06-16 · unverdicted · novelty 6.0

Velocity-field disagreement yields calibrated epistemic uncertainty for flow-based VLAs, supporting failure detection and uncertainty-guided active fine-tuning that cuts required samples by ≥22%.

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

Evidence-carrying multimodal agents decompose tool calls into predicates, obtain certificates from DOM/OCR/AX verifiers, and use a deterministic gate to authorize actions only when certificates support them, achieving zero unsafe executions in tested tasks.

citing papers explorer

Showing 4 of 4 citing papers after filters.

SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs cs.CV · 2026-06-18 · unverdicted · none · ref 21 · internal anchor
SPOT-E uses entropy shaping on answer predictions with low-entropy anchors to optimize visual spotlights at test time via GRPO for better VLM performance on evidence-intensive tasks.
Trajectory-Level Redirection Attacks on Vision-Language-Action Models cs.RO · 2026-06-11 · unverdicted · none · ref 51 · internal anchor
A prompt-only attack called command-preserving trajectory redirection can steer VLA robot behavior to attacker-chosen physical outcomes while the text still appears to match the intended task.
Uncertainty Quantification for Flow-Based Vision-Language-Action Models cs.RO · 2026-06-16 · unverdicted · none · ref 43 · internal anchor
Velocity-field disagreement yields calibrated epistemic uncertainty for flow-based VLAs, supporting failure detection and uncertainty-guided active fine-tuning that cuts required samples by ≥22%.
Hallucination as Exploit: Evidence-Carrying Multimodal Agents cs.AI · 2026-05-18 · unverdicted · none · ref 27 · internal anchor
Evidence-carrying multimodal agents decompose tool calls into predicates, obtain certificates from DOM/OCR/AX verifiers, and use a deterministic gate to authorize actions only when certificates support them, achieving zero unsafe executions in tested tasks.

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

fields

years

verdicts

representative citing papers

citing papers explorer