POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

\'Alvaro Guti\'errez; Annemarie F. Laudanski; Eduardo Rocon; I\~naki Dellibarda Varela; J.M. Valverde-Garc\'ia; Manuel Cebrian; Pablo Romero-Sorozabal; R. Sendra-Arranz

arxiv: 2606.02282 · v1 · pith:EZKV4R5Lnew · submitted 2026-06-01 · 💻 cs.AI

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

I\~naki Dellibarda Varela , R. Sendra-Arranz , Pablo Romero-Sorozabal , J.M. Valverde-Garc\'ia , Annemarie F. Laudanski , \'Alvaro Guti\'errez , Eduardo Rocon , Manuel Cebrian This is my paper

classification 💻 cs.AI

keywords poirotagentsfaultmulti-agentsystemsfailuresafety-criticalacross

0 comments

read the original abstract

Orchestrating Large Language Models into Multi-Agent Systems (LLM-MAS) has unlocked remarkable reasoning capabilities, yet emergent failures and hallucinations that resist characterisation block their deployment in safety-critical domains -- a gap made legally untenable by emerging AI regulation. Existing evaluation paradigms share a common flaw: centralised judgment creates single points of failure and demands domain-specific expertise. Here we present POIROT, a protocol that repurposes a system's own agents as its diagnostic layer, leveraging the epistemic diversity already present in the architecture. Across evaluated settings, POIROT outperforms single-LLM evaluator baselines, with gains that scale with problem complexity (OR = 1.60, $p = 0.008$), agent count, and fault dimensionality, persisting under compound fault conditions. These results demonstrate that safety oversight need not be externalised: the agents executing a role carry sufficient collective intelligence to audit it. We release POIROT as an open-source library alongside BLAME, a benchmark for fault attribution in safety-critical multi-agent systems.

This paper has not been read by Pith yet.

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

discussion (0)