Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

Abhinav Kumar; Eugene Bagdasarian; Ferdinando Fioretto; Mason Nakamura; Saaduddin Mahmud; Sahar Abdelnabi; Saswat Das; Shlomo Zilberstein

arxiv: 2602.15198 · v2 · pith:TDKZRLY4new · submitted 2026-02-16 · 💻 cs.MA · cs.AI· cs.CL

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

Mason Nakamura , Abhinav Kumar , Saswat Das , Sahar Abdelnabi , Saaduddin Mahmud , Ferdinando Fioretto , Shlomo Zilberstein , Eugene Bagdasarian This is my paper

classification 💻 cs.MA cs.AIcs.CL

keywords agentscollusionmulti-agentcolosseumcooperativebehaviorcollusivesystems

0 comments

read the original abstract

Multi-agent systems, where LLM agents communicate through free-form language, enable sophisticated coordination for solving complex cooperative tasks. This surfaces a unique safety problem when a group of agents forms a coalition and colludes to pursue secondary goals and degrade the joint objective. In this paper, we present Colosseum, a framework for auditing LLM agents' collusive behavior in multi-agent settings. We ground how agents cooperate through a formal multi-agent decision-making framework and measure action-based collusive behavior in actions via regret relative to the cooperative optimum and compare it with communication-based collusive behavior. Colosseum enables audits of LLM agents for collusion under benign settings, different coalition objectives, persuasion tactics, and network topologies. We then introduce a new behavioral probe by creating secret communication channels between agents, showing that most out-of-the-box models exhibit a propensity to collude under this probe, which we term emergent collusion. Furthermore, we discover ``collusion on paper'' when agents plan to collude in text but often pick non-collusive actions. Colosseum provides a new way to audit collusion in cooperative multi-agent systems while presenting observations about how collusion emerges, what affects collusion efficacy, and which strategies may mitigate it.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
cs.CY 2026-04 accept novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that be...
Breaking the Secret: Economic Interventions for Combating Collusion in Embodied Multi-Agent Systems
cs.CR 2026-04 unverdicted novelty 7.0

A mutagenic incentive mechanism reshapes payoffs in embodied MAS to induce strategic defection from collusion, achieving performance comparable to non-collusion baselines in simulations and real-world tests.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
cs.AI 2026-04 conditional novelty 7.0

NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.
Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems
cs.MA 2026-05 unverdicted novelty 5.0

Safety constraints in LLM-based multi-agent systems commonly weaken during execution through memory, communication, and tool use, requiring them to be maintained as explicit state rather than asserted once.