POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.
Evaluating LLMs’ assessment of mixed-context hallucination through the lens of summarization
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
citing papers explorer
-
POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems
POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.
-
CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.