AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.
Only output filtering with hardcoded rules in application code prevented prompt injection leaks in LLMs, as all model-based defenses were defeated by an adaptive attacker.
citing papers explorer
-
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
-
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey
Introduces Trust-RAG Compass framework and TRC Bench benchmark to assess RAG trustworthiness across factuality, robustness, fairness, transparency, accountability, and privacy, with evaluations showing performance gaps between LLMs.
-
ACE: A Security Architecture for LLM-Integrated App Systems
ACE decouples planning into abstract and concrete phases with static information-flow verification and enforces execution barriers to secure LLM app systems against prompt injection and related attacks.
-
Evaluation of Prompt Injection Defenses in Large Language Models
Only output filtering with hardcoded rules in application code prevented prompt injection leaks in LLMs, as all model-based defenses were defeated by an adaptive attacker.