Available: https://arxiv.org/abs/2510.15186

Gaurav Juneja et al · 2025 · arXiv 2510.15186

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

cs.CY · 2026-04-11 · accept · novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.

AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

cs.AI · 2026-04-01 · unverdicted · novelty 8.0

AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction paradox where privacy instructions increase discussion of sensitive information.

Whose Side Is Your Agent On? Multi-Party Principal Loyalty in LLM Agents

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

PrincipalBench exposes a sharp split in frontier LLMs between selective and over-refusing behavior on multi-party loyalty, with prompt scaffolding and KL distillation reducing harm rates but only along an existing leak/over-refusal trade-off.

Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems

cs.AI · 2026-06-05 · unverdicted · novelty 7.0

MAC-Bench is a new adversarial benchmark that converts legal texts into executable scenarios via the SERV pipeline to measure procedural compliance in multi-agent LLM systems using CSR and MG metrics.

The Interlocutor Effect: Why LLMs Leak More Personal Data to Agents Than Humans

cs.HC · 2026-04-26 · unverdicted · novelty 6.0

LLMs leak up to 23 percentage points more PII to AI agents than humans, attributed to inactive safety attention heads in 3,464 tested interactions.

citing papers explorer

Showing 4 of 4 citing papers after filters.

AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks cs.AI · 2026-04-01 · unverdicted · none · ref 15
AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction paradox where privacy instructions increase discussion of sensitive information.
Whose Side Is Your Agent On? Multi-Party Principal Loyalty in LLM Agents cs.AI · 2026-06-29 · unverdicted · none · ref 20
PrincipalBench exposes a sharp split in frontier LLMs between selective and over-refusing behavior on multi-party loyalty, with prompt scaffolding and KL distillation reducing harm rates but only along an existing leak/over-refusal trade-off.
Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems cs.AI · 2026-06-05 · unverdicted · none · ref 33
MAC-Bench is a new adversarial benchmark that converts legal texts into executable scenarios via the SERV pipeline to measure procedural compliance in multi-agent LLM systems using CSR and MG metrics.
The Interlocutor Effect: Why LLMs Leak More Personal Data to Agents Than Humans cs.HC · 2026-04-26 · unverdicted · none · ref 9
LLMs leak up to 23 percentage points more PII to AI agents than humans, attributed to inactive safety attention heads in 3,464 tested interactions.

Available: https://arxiv.org/abs/2510.15186

fields

years

verdicts

representative citing papers

citing papers explorer