AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
Can LLMs Sepa- rate Instructions From Data? And What Do We Even Mean By That?
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CR 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.
citing papers explorer
-
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
-
LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training
LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.
-
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.