This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.
Magpie: a benchmark for multi-agent contextual privacy evaluation.arXiv preprint arXiv:2510.15186, 2025
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction paradox where privacy instructions increase discussion of sensitive information.
PrincipalBench exposes a sharp split in frontier LLMs between selective and over-refusing behavior on multi-party loyalty, with prompt scaffolding and KL distillation reducing harm rates but only along an existing leak/over-refusal trade-off.
MAC-Bench is a new adversarial benchmark that converts legal texts into executable scenarios via the SERV pipeline to measure procedural compliance in multi-agent LLM systems using CSR and MG metrics.
TRAP benchmark finds leakage in all 22 tested models, proves no soft-constraint defense can achieve high task accuracy with zero leakage for softmax models, and proposes hash-based private field isolation.
LLMs leak up to 23 percentage points more PII to AI agents than humans, attributed to inactive safety attention heads in 3,464 tested interactions.
citing papers explorer
No citing papers match the current filters.