MultiPhishGuard: An Explainable and Adaptive Multi-Agent LLM System for Phishing Email Detection

Eric Spero; Giovanni Russello; Meng Wai Woo; Wei Gao; Yinuo Xue

arxiv: 2505.23803 · v2 · pith:24LEBKIKnew · submitted 2025-05-26 · 💻 cs.CR · cs.AI

MultiPhishGuard: An Explainable and Adaptive Multi-Agent LLM System for Phishing Email Detection

Yinuo Xue , Eric Spero , Meng Wai Woo , Wei Gao , Giovanni Russello This is my paper

classification 💻 cs.CR cs.AI

keywords detectionphishingadversarialemailagentagentsmulti-agentmultiphishguard

0 comments

read the original abstract

Phishing email detection faces significant challenges due to evolving adversarial tactics and heterogeneous attack patterns. Traditional approaches, such as rule-based filters and denylists, often struggle to keep pace, leading to missed detections and security risks. While machine learning methods have improved detection performance, they remain limited in adapting to novel and rapidly changing phishing strategies. We present MultiPhishGuard, an LLM-based multi-agent detection framework with learned coordination across specialized agents. The system consists of five cooperative agents (text, URL, metadata, explanation simplifier, and adversarial agents), with agent contributions dynamically weighted using Proximal Policy Optimization. To address emerging threats, the framework incorporates an adversarial training loop in which an LLM-based agent generates subtle, context-aware email variants to expose potential model weaknesses and improve robustness to ambiguous phishing cases. Experimental evaluations on public datasets show that MultiPhishGuard achieves stronger performance than established baselines, including Chain-of-Thought prompting and single-agent variants, as supported by ablation studies and comparative analyses. The system achieves an accuracy of 97.89%, with a false positive rate of 2.73% and a false negative rate of 0.20%. In addition, an explanation simplifier agent transforms technical model outputs into plain-language rationales intended for human users. Overall, these results suggest that multi-agent LLM architectures with adaptive coordination and adversarial training represent a promising direction for phishing email detection.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SoK: Exposing the Generation and Detection Gaps in LLM-Generated Phishing
cs.CR 2025-08 unverdicted novelty 7.0

This SoK paper introduces a nine-stage taxonomy for LLM guardrail breaches in phishing, characterizes evasion and manipulation tactics, and identifies a dynamic-offense versus static-defense asymmetry.