This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.
arXiv preprint arXiv:2506.14697 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
HazardArena shows VLA models trained on safe data frequently produce unsafe actions in semantically risky but visually similar settings, and a training-free Safety Option Layer reduces those failures with little performance cost.
SecureWebArena is a new benchmark suite for holistic security evaluation of LVLM-based web agents using diverse simulated environments, attack taxonomies, and multi-layered failure analysis across reasoning, behavior, and outcomes.
GuardAD reduces accident rates by 32% in autonomous driving MLLMs by using n-th order Markovian logic to infer latent hazards and revise actions.
A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to preserve benign generation quality.
A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.
citing papers explorer
-
Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.
-
Using large language models for embodied planning introduces systematic safety risks
LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
-
HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models
HazardArena shows VLA models trained on safe data frequently produce unsafe actions in semantically risky but visually similar settings, and a training-free Safety Option Layer reduces those failures with little performance cost.
-
SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
SecureWebArena is a new benchmark suite for holistic security evaluation of LVLM-based web agents using diverse simulated environments, attack taxonomies, and multi-layered failure analysis across reasoning, behavior, and outcomes.
-
GuardAD: Safeguarding Autonomous Driving MLLMs via Markovian Safety Logic
GuardAD reduces accident rates by 32% in autonomous driving MLLMs by using n-th order Markovian logic to infer latent hazards and revise actions.
-
A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models
A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to preserve benign generation quality.
-
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.
- Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses