WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.
Phishagent: A robust multimodal agent for phishing webpage detection.arXiv preprint arXiv:2408.10738, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections
WARD is a guard model trained on 177K web samples and adversarially hardened via attacker-guard co-evolution to achieve high recall on prompt injections with low false positives and no added latency.