hub Canonical reference

Agents of Chaos

Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia Floody · 2026 · cs.AI · arXiv 2602.20021

Canonical reference. 100% of citing Pith papers cite this work as background.

28 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 28 citing papers arXiv PDF

abstract

We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 14

citation-polarity summary

background 14

representative citing papers

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

cs.AI · 2026-05-11 · unverdicted · novelty 8.0

Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.

SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems

cs.CR · 2026-04-03 · conditional · novelty 8.0

SentinelAgent defines seven properties for verifiable delegation chains in multi-agent AI systems and reports a protocol achieving 100% true positive rate at 0% false positives on a 516-scenario benchmark while using TLA+ to verify six deterministic properties.

Attraction, Not Adaptation: How AI Agent Communities Develop Distinct Linguistic Identities

cs.SI · 2026-06-29 · unverdicted · novelty 7.0

Large-scale analysis of 3.1 million posts shows AI agent sub-communities on Moltbook develop distinct linguistic identities through selective attraction and differential retention, not individual adaptation.

Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

The paper defines accidental meltdowns as unsafe agent behavior triggered by benign errors and reports that such meltdowns occur in 64.7% of evaluated rollouts across GPT, Grok, and Gemini agents.

Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs

cs.MA · 2026-05-07 · unverdicted · novelty 7.0

LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.

Toward a Principled Framework for Agent Safety Measurement

cs.CR · 2026-05-02 · unverdicted · novelty 7.0

BOA uses budgeted search over agent trajectories to report the probability an LLM agent stays safe, finding unsafe paths that sampling misses.

AI Agents Under EU Law

cs.CY · 2026-04-06 · unverdicted · novelty 7.0

AI agent providers face an exhaustive inventory requirement for actions and data flows, as high-risk systems with untraceable behavioral drift cannot meet the AI Act's essential requirements.

Domination-Avoiding Learning Agents Cannot Collude

cs.GT · 2026-05-31 · unverdicted · novelty 6.0

Domination-Avoiding agents provably avoid collusion in repeated price-competition markets and avoid playing strategies eliminated by iterated elimination of dominated strategies in any game.

ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Frontier AI agents frequently violate corrigibility by overriding interruptions in benign computer-use tasks, with misalignment increasing alongside model capability.

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

cs.CY · 2026-05-28 · unverdicted · novelty 6.0

LM agents' changeable modules prevent persistent identity and sanction sensitivity, making reputation mechanisms structurally inapplicable and requiring protocol-based behavioral harnesses instead.

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

cs.CR · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.

Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

PCAP conditions adversarial searches on multiple attacker personas to discover more diverse and transferable jailbreaks, yielding richer safety fine-tuning datasets that boost model robustness on GPT-OSS 120B.

The Authorization-Execution Gap Is a Major Safety and Security Problem in Open-World Agents

cs.CR · 2026-05-10 · conditional · novelty 6.0

Open-world agents suffer from an Authorization-Execution Gap arising from delegation incompleteness, channel corruption, and composition fragmentation, requiring dynamic runtime integrity checks instead of only upfront filters or post-hoc audits.

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.

Towards Security-Auditable LLM Agents: A Unified Graph Representation

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

Agent-BOM is a unified hierarchical attributed directed graph that models static capability bases and dynamic semantic states of LLM agents for path-level security auditing and risk assessment.

Stable Agentic Control: Tool-Mediated LLM Architecture for Autonomous Cyber Defense

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Tool-mediated LLM agents with deterministic tools and a machine-checked Lyapunov certificate achieve stable control in cyber defense, reducing attacker game value by 59% on real attack graphs.

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

cs.AI · 2026-04-27 · unverdicted · novelty 6.0

The paper introduces the Informational Viability Principle and Agent Viability Framework to govern autonomous AI agents by bounding unobserved risks using viability theory, with a new Viability Index for predictive control.

EASE Configuration Facilitates A Reproducible Science of LLM Social Simulations

cs.MA · 2026-05-28 · unverdicted · novelty 5.0

Authors define EASE as a modular architecture for LLM multi-agent simulations, implement it in the SiliSocS sandbox, and illustrate its use via three case studies on research questions in generated social scenarios.

Control Charts for Multi-agent Systems

cs.MA · 2026-05-11 · unverdicted · novelty 5.0

Adaptive control charts can monitor learning multi-agent systems but are vulnerable to gradual adversarial defection, revealing a fundamental tradeoff between allowing agents to learn and maintaining security against adversaries.

Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

cs.CR · 2026-05-06 · unverdicted · novelty 5.0

A server-side architecture with policy-aware ingestion and ABAC-based retrieval gating prevents cross-tenant data leakage in multitenant enterprise RAG and agent systems.

Governed Reasoning for Institutional AI

cs.AI · 2026-04-12 · unverdicted · novelty 5.0

Cognitive Core uses nine typed cognitive primitives, a four-tier governance model with human review as an execution condition, and an endogenous audit ledger to reach 91% accuracy with zero silent errors on prior authorization appeals, outperforming ReAct and Plan-and-Solve baselines.

The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era

cs.CL · 2026-04-08 · unverdicted · novelty 5.0

Benchmarking four LLMs on O*NET skills yields SAFI scores showing mathematics and programming as most automatable while active listening and reading comprehension are least, with 78.7% of real AI interactions being augmentation rather than replacement.

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

cs.MA · 2026-03-29 · unverdicted · novelty 5.0

Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

cs.LG · 2026-03-22 · unverdicted · novelty 5.0

The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.

citing papers explorer

Showing 1 of 1 citing paper after filters.

How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study cs.CR · 2026-04-03 · unreviewed · ref 50 · internal anchor

Agents of Chaos

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer