MultiPhishGuard: An Explainable and Adaptive Multi-Agent LLM System for Phishing Email Detection
read the original abstract
Phishing email detection faces significant challenges due to evolving adversarial tactics and heterogeneous attack patterns. Traditional approaches, such as rule-based filters and denylists, often struggle to keep pace, leading to missed detections and security risks. While machine learning methods have improved detection performance, they remain limited in adapting to novel and rapidly changing phishing strategies. We present MultiPhishGuard, an LLM-based multi-agent detection framework with learned coordination across specialized agents. The system consists of five cooperative agents (text, URL, metadata, explanation simplifier, and adversarial agents), with agent contributions dynamically weighted using Proximal Policy Optimization. To address emerging threats, the framework incorporates an adversarial training loop in which an LLM-based agent generates subtle, context-aware email variants to expose potential model weaknesses and improve robustness to ambiguous phishing cases. Experimental evaluations on public datasets show that MultiPhishGuard achieves stronger performance than established baselines, including Chain-of-Thought prompting and single-agent variants, as supported by ablation studies and comparative analyses. The system achieves an accuracy of 97.89%, with a false positive rate of 2.73% and a false negative rate of 0.20%. In addition, an explanation simplifier agent transforms technical model outputs into plain-language rationales intended for human users. Overall, these results suggest that multi-agent LLM architectures with adaptive coordination and adversarial training represent a promising direction for phishing email detection.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
SoK: Exposing the Generation and Detection Gaps in LLM-Generated Phishing
This SoK paper introduces a nine-stage taxonomy for LLM guardrail breaches in phishing, characterizes evasion and manipulation tactics, and identifies a dynamic-offense versus static-defense asymmetry.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.