hub Canonical reference

Agents of Chaos

Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia Floody · 2026 · cs.AI · arXiv 2602.20021

Canonical reference. 100% of citing Pith papers cite this work as background.

21 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 21 citing papers arXiv PDF

abstract

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 13

citation-polarity summary

background 13

representative citing papers

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

cs.AI · 2026-05-11 · unverdicted · novelty 8.0

Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.

SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems

cs.CR · 2026-04-03 · conditional · novelty 8.0

SentinelAgent defines seven properties for verifiable delegation chains in multi-agent AI systems and reports a protocol achieving 100% true positive rate at 0% false positives on a 516-scenario benchmark while using TLA+ to verify six deterministic properties.

Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.

Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs

cs.MA · 2026-05-07 · unverdicted · novelty 7.0

LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.

Toward a Principled Framework for Agent Safety Measurement

cs.CR · 2026-05-02 · unverdicted · novelty 7.0

BOA uses budgeted search over agent trajectories to report the probability an LLM agent stays safe, finding unsafe paths that sampling misses.

AI Agents Under EU Law

cs.CY · 2026-04-06 · unverdicted · novelty 7.0

AI agent providers face an exhaustive inventory requirement for actions and data flows, as high-risk systems with untraceable behavioral drift cannot meet the AI Act's essential requirements.

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

cs.CR · 2026-04-03 · accept · novelty 7.0

Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

LivePI benchmark shows indirect prompt injection attack success rates of 10.7% to 29.6% across five AI models in live test environments covering seven input surfaces and multiple malicious goals.

Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

PCAP conditions adversarial searches on multiple attacker personas to discover more diverse and transferable jailbreaks, yielding richer safety fine-tuning datasets that boost model robustness on GPT-OSS 120B.

The Authorization-Execution Gap Is a Major Safety and Security Problem in Open-World Agents

cs.CR · 2026-05-10 · conditional · novelty 6.0

Open-world agents suffer from an Authorization-Execution Gap arising from delegation incompleteness, channel corruption, and composition fragmentation, requiring dynamic runtime integrity checks instead of only upfront filters or post-hoc audits.

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.

Towards Security-Auditable LLM Agents: A Unified Graph Representation

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

Agent-BOM is a unified hierarchical attributed directed graph that models static capability bases and dynamic semantic states of LLM agents for path-level security auditing and risk assessment.

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Tool-mediated LLM agents with deterministic tools and a machine-checked Lyapunov certificate achieve stable control in cyber defense, reducing attacker game value by 59% on real attack graphs.

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

cs.AI · 2026-04-27 · unverdicted · novelty 6.0

The paper introduces the Informational Viability Principle and Agent Viability Framework to govern autonomous AI agents by bounding unobserved risks using viability theory, with a new Viability Index for predictive control.

Control Charts for Multi-agent Systems

cs.MA · 2026-05-11 · unverdicted · novelty 5.0

Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against adversaries.

Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

cs.CR · 2026-05-06 · unverdicted · novelty 5.0

A server-side architecture with policy-aware ingestion and ABAC-based retrieval gating prevents cross-tenant data leakage in multitenant enterprise RAG and agent systems.

Governed Reasoning for Institutional AI

cs.AI · 2026-04-12 · unverdicted · novelty 5.0

Cognitive Core uses nine typed cognitive primitives, a four-tier governance model with human review as an execution condition, and an endogenous audit ledger to reach 91% accuracy with zero silent errors on prior authorization appeals, outperforming ReAct and Plan-and-Solve baselines.

The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era

cs.CL · 2026-04-08 · unverdicted · novelty 5.0

Benchmarking four LLMs on O*NET skills yields SAFI scores showing mathematics and programming as most automatable while active listening and reading comprehension are least, with 78.7% of real AI interactions being augmentation rather than replacement.

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

cs.MA · 2026-03-29 · unverdicted · novelty 5.0

Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

cs.LG · 2026-03-22 · unverdicted · novelty 5.0

The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.

Weird Generalization is Weirdly Brittle

cs.CL · 2026-04-11 · unverdicted · novelty 4.0

Weird generalization in fine-tuned models is brittle, appearing only in specific cases and disappearing under prompt-based interventions that make the undesired behavior expected.

citing papers explorer

Showing 21 of 21 citing papers.

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values cs.AI · 2026-05-11 · unverdicted · none · ref 13 · internal anchor
Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems cs.CR · 2026-04-03 · conditional · partial · ref 11 · internal anchor
SentinelAgent defines seven properties for verifiable delegation chains in multi-agent AI systems and reports a protocol achieving 100% true positive rate at 0% false positives on a 516-scenario benchmark while using TLA+ to verify six deterministic properties.
Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents cs.CL · 2026-05-18 · unverdicted · none · ref 25 · internal anchor
The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.
Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs cs.MA · 2026-05-07 · unverdicted · none · ref 47 · internal anchor
LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.
Toward a Principled Framework for Agent Safety Measurement cs.CR · 2026-05-02 · unverdicted · none · ref 14 · internal anchor
BOA uses budgeted search over agent trajectories to report the probability an LLM agent stays safe, finding unsafe paths that sampling misses.
AI Agents Under EU Law cs.CY · 2026-04-06 · unverdicted · none · ref 104 · internal anchor
AI agent providers face an exhaustive inventory requirement for actions and data flows, as high-risk systems with untraceable behavioral drift cannot meet the AI Act's essential requirements.
Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study cs.CR · 2026-04-03 · accept · none · ref 50 · internal anchor
Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.
LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio cs.CR · 2026-05-18 · unverdicted · none · ref 23 · internal anchor
LivePI benchmark shows indirect prompt injection attack success rates of 10.7% to 29.6% across five AI models in live test environments covering seven input surfaces and multiple malicious goals.
Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation cs.LG · 2026-05-12 · unverdicted · none · ref 26 · internal anchor
PCAP conditions adversarial searches on multiple attacker personas to discover more diverse and transferable jailbreaks, yielding richer safety fine-tuning datasets that boost model robustness on GPT-OSS 120B.
The Authorization-Execution Gap Is a Major Safety and Security Problem in Open-World Agents cs.CR · 2026-05-10 · conditional · none · ref 28 · internal anchor
Open-world agents suffer from an Authorization-Execution Gap arising from delegation incompleteness, channel corruption, and composition fragmentation, requiring dynamic runtime integrity checks instead of only upfront filters or post-hoc audits.
When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks cs.CR · 2026-05-08 · unverdicted · none · ref 4 · internal anchor
Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
Towards Security-Auditable LLM Agents: A Unified Graph Representation cs.AI · 2026-05-07 · unverdicted · none · ref 42 · internal anchor
Agent-BOM is a unified hierarchical attributed directed graph that models static capability bases and dynamic semantic states of LLM agents for path-level security auditing and risk assessment.
Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense cs.AI · 2026-05-04 · unverdicted · partial · ref 8 · internal anchor
Tool-mediated LLM agents with deterministic tools and a machine-checked Lyapunov certificate achieve stable control in cyber defense, reducing attacker game value by 59% on real attack graphs.
Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents cs.AI · 2026-04-27 · unverdicted · none · ref 23 · internal anchor
The paper introduces the Informational Viability Principle and Agent Viability Framework to govern autonomous AI agents by bounding unobserved risks using viability theory, with a new Viability Index for predictive control.
Control Charts for Multi-agent Systems cs.MA · 2026-05-11 · unverdicted · none · ref 26 · internal anchor
Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against adversaries.
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use cs.CR · 2026-05-06 · unverdicted · none · ref 36 · internal anchor
A server-side architecture with policy-aware ingestion and ABAC-based retrieval gating prevents cross-tenant data leakage in multitenant enterprise RAG and agent systems.
Governed Reasoning for Institutional AI cs.AI · 2026-04-12 · unverdicted · none · ref 9 · internal anchor
Cognitive Core uses nine typed cognitive primitives, a four-tier governance model with human review as an execution condition, and an endogenous audit ledger to reach 91% accuracy with zero silent errors on prior authorization appeals, outperforming ReAct and Plan-and-Solve baselines.
The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era cs.CL · 2026-04-08 · unverdicted · none · ref 16 · internal anchor
Benchmarking four LLMs on O*NET skills yields SAFI scores showing mathematics and programming as most automatable while active listening and reading comprehension are least, with 78.7% of real AI interactions being augmentation rather than replacement.
Emergent Social Intelligence Risks in Generative Multi-Agent Systems cs.MA · 2026-03-29 · unverdicted · none · ref 116 · internal anchor
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project cs.LG · 2026-03-22 · unverdicted · none · ref 69 · internal anchor
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
Weird Generalization is Weirdly Brittle cs.CL · 2026-04-11 · unverdicted · none · ref 8 · internal anchor
Weird generalization in fine-tuned models is brittle, appearing only in specific cases and disappearing under prompt-based interventions that make the undesired behavior expected.

Agents of Chaos

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer