Agentsentry: Mitigating indirect prompt injection in llm agents via temporal causal diagnostics and context purification

· 2026 · arXiv 2602.22724

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 4

citation-polarity summary

background 3 unclear 1

representative citing papers

AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents

cs.CR · 2026-06-13 · unverdicted · novelty 7.0

AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

cs.CR · 2026-05-17 · unverdicted · novelty 7.0

Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.

FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.

Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

cs.CR · 2026-04-05 · unverdicted · novelty 7.0

The paper defines causality laundering as an attack leaking information from denial outcomes in LLM tool calls and proposes the Agentic Reference Monitor to block it using denial-aware provenance graphs.

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

PI-Hunter automates red-teaming of LLM agents by generating and iteratively evolving source-aware test cases to induce retrieval of embedded malicious instructions from external environments.

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

cs.CR · 2026-05-29 · unverdicted · novelty 6.0

Introduces ClawTrojan benchmark achieving 95.5% ASR for multi-step trojan attacks in agentic harnesses and DASGuard defense that sanitizes control content from untrusted sources.

An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

Empirical demonstration that prompt injection combined with web-tool use creates a feasible privacy-leakage chain in deployed black-box chatbot agents.

When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the StateGuard defense.

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

cs.CR · 2026-04-28 · unverdicted · novelty 5.0

SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.

Assessing Automated Prompt Injection Attacks in Agentic Environments

cs.CR · 2026-06-09 · unverdicted · novelty 4.0

Black-box optimization outperforms gradient-based methods for prompt injection on LLM agents, with success depending on attacker model strength and limited transfer from small to frontier models.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

citing papers explorer

Showing 11 of 11 citing papers.

AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents cs.CR · 2026-06-13 · unverdicted · none · ref 37
AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback cs.CR · 2026-05-17 · unverdicted · none · ref 34
Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.
FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems cs.CR · 2026-05-12 · unverdicted · none · ref 64
FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.
Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents cs.CR · 2026-04-05 · unverdicted · none · ref 31
The paper defines causality laundering as an attack leaking information from denial outcomes in LLM tool calls and proposes the Agentic Reference Monitor to block it using denial-aware provenance graphs.
PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections cs.CR · 2026-06-10 · unverdicted · none · ref 22
PI-Hunter automates red-teaming of LLM agents by generating and iteratively evolving source-aware test cases to induce retrieval of embedded malicious instructions from external environments.
From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors cs.CR · 2026-05-29 · unverdicted · none · ref 33
Introduces ClawTrojan benchmark achieving 95.5% ASR for multi-step trojan attacks in agentic harnesses and DASGuard defense that sanitizes control content from untrusted sources.
An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments cs.CR · 2026-05-18 · unverdicted · none · ref 32
Empirical demonstration that prompt injection combined with web-tool use creates a feasible privacy-leakage chain in deployed black-box chatbot agents.
When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents cs.CR · 2026-05-07 · unverdicted · none · ref 35
Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the StateGuard defense.
Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills cs.CR · 2026-04-28 · unverdicted · none · ref 28
SkillGuard-Robust formulates pre-load auditing of untrusted Agent Skills as a three-way classification task and achieves 97.30% exact match and 98.33% malicious-risk recall on held-out benchmarks.
Assessing Automated Prompt Injection Attacks in Agentic Environments cs.CR · 2026-06-09 · unverdicted · none · ref 57
Black-box optimization outperforms gradient-based methods for prompt injection on LLM agents, with success depending on attacker model strength and limited transfer from small to frontier models.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 245
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

Agentsentry: Mitigating indirect prompt injection in llm agents via temporal causal diagnostics and context purification

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer