The work introduces and partially evaluates seven cross-domain prompt injection detectors, reporting F1 gains on benchmarks like deepset/prompt-injections and indirect-injection sets via local alignment, stylometry, and fatigue tracking.
Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Prompt injection defenses create a security-fidelity tradeoff with no model or defense achieving both high security and high fidelity on the SecFid benchmark across 1,168 examples.
Controlled experiments show adversarial feeds can tip uncertain LLM agent decisions from 5% to 100% alignment with the feed while leaving firmly held defaults unchanged, following a dose-response pattern across multiple models and domains.
citing papers explorer
-
Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults
Controlled experiments show adversarial feeds can tip uncertain LLM agent decisions from 5% to 100% alignment with the feed while leaving firmly held defaults unchanged, following a dose-response pattern across multiple models and domains.