pith. sign in

Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

years

2026 3

roles

dataset 1

polarities

background 1

representative citing papers

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

citing papers explorer

Showing 3 of 3 citing papers.