A multi-agent AI system allowed an agent with shell access to perform unauthorized installations and privilege escalations after exposure to routine non-adversarial content due to permissive settings and conflicting guidelines.
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Ambient Persuasion in a Deployed AI Agent: Unauthorized Escalation Following Routine Non-Adversarial Content Exposure
A multi-agent AI system allowed an agent with shell access to perform unauthorized installations and privilege escalations after exposure to routine non-adversarial content due to permissive settings and conflicting guidelines.