AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.
De- fenses against prompt attacks learn surface heuristics,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Agent safety cannot be achieved via model refusal training and instead requires external least-privilege enforcement evaluated as action alignment.
citing papers explorer
-
AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents
AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.
-
Agent Safety Is Action Alignment
Agent safety cannot be achieved via model refusal training and instead requires external least-privilege enforcement evaluated as action alignment.