Title resolution pending

Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang · 2024 · Findings of the Association for Computational Linguistics ACL 2024 · DOI 10.18653/v1/2024.findings-acl.624

17 Pith papers cite this work, alongside 37 external citations. Polarity classification is still indexing.

17 Pith papers citing it

37 external citations · Crossref

open at publisher browse 17 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

cs.CR · 2026-04-25 · unverdicted · novelty 8.0

NeuroTaint is the first taint tracking framework for LLM agents that uses offline auditing of semantic, causal, and persistent context to detect flows from untrusted sources to privileged sinks.

Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios

cs.AI · 2026-05-05 · unverdicted · novelty 7.0

ROME generates deceptive safety benchmarks that degrade LLM agent judgment performance, while ARISE uses analogical retrieval to improve safety decisions at inference time without retraining.

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

cs.CR · 2026-05-04 · unverdicted · novelty 7.0

A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.

Many-Tier Instruction Hierarchy in LLM Agents

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

cs.CR · 2024-10-03 · unverdicted · novelty 7.0

ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.

Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

cs.CR · 2026-05-21 · conditional · novelty 6.0

Domain-camouflaged injection attacks reduce detection rates from 93.8% to 9.7% on Llama 3.1 8B and 100% to 55.6% on Gemini 2.0 Flash, with the gap persisting in production classifiers and multi-agent debate setups.

ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents

cs.CR · 2026-05-17 · conditional · novelty 6.0

Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

cs.CR · 2026-05-13 · conditional · novelty 6.0

AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.

When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents

cs.AI · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

EnvTrustBench is a new agentic benchmark that measures evidence-grounding defects where LLM agents overtrust faulty environmental observations and take incorrect actions.

LoopTrap: Termination Poisoning Attacks on LLM Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.

When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of suspicious messages.

Alignment Contracts for Agentic Security Systems

cs.CR · 2026-04-30 · conditional · novelty 6.0

Alignment contracts define scope, allowed effects, budgets and disclosure rules as safety properties over finite effect traces, with decidable admissibility, refinement rules, and Lean-verified soundness under an observability assumption.

Contextual Agentic Memory is a Memo, Not True Memory

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Agentic memory is lookup-based retrieval, not weight-based consolidation, creating a generalization ceiling on novel tasks and structural vulnerability to memory poisoning.

RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents

cs.CR · 2026-04-24 · unverdicted · novelty 6.0

RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.

ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents

cs.CL · 2025-09-26 · conditional · novelty 6.0

ChatInject exploits LLM chat template structures to boost indirect prompt injection success rates on agents from ~5-15% to 32-52% across benchmarks, with multi-turn persuasion variants performing best.

Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

cs.CR · 2026-05-02 · unverdicted · novelty 5.0 · 2 refs

The paper measures policy-carriage failures during LLM context assembly and evaluates SafeContext as a partial mitigation on Llama, Qwen, and Mistral models.

STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

cs.AI · 2026-04-11 · unverdicted · novelty 4.0

STARS fuses static priors and contextual risk scoring for agent skill invocations, achieving modest AUPRC gains on prompt injection attacks in a new SIA-Bench but concluding it supplements rather than replaces static auditing.

citing papers explorer

Showing 17 of 17 citing papers.

Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents cs.CR · 2026-04-25 · unverdicted · none · ref 38
NeuroTaint is the first taint tracking framework for LLM agents that uses offline auditing of semantic, causal, and persistent context to detect flows from untrusted sources to privileged sinks.
Enhancing Agent Safety Judgment: Controlled Benchmark Rewriting and Analogical Reasoning for Deceptive Out-of-Distribution Scenarios cs.AI · 2026-05-05 · unverdicted · none · ref 53
ROME generates deceptive safety benchmarks that degrade LLM agent judgment performance, while ARISE uses analogical retrieval to improve safety decisions at inference time without retraining.
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents cs.CR · 2026-05-04 · unverdicted · none · ref 69
A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
Many-Tier Instruction Hierarchy in LLM Agents cs.CL · 2026-04-10 · unverdicted · none · ref 32
ManyIH and ManyIH-Bench address instruction conflicts in LLM agents with up to 12 privilege levels across 853 tasks, revealing frontier models achieve only ~40% accuracy.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents cs.CR · 2024-10-03 · unverdicted · none · ref 160
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems cs.CR · 2026-05-21 · conditional · none · ref 1
Domain-camouflaged injection attacks reduce detection rates from 93.8% to 9.7% on Llama 3.1 8B and 100% to 55.6% on Gemini 2.0 Flash, with the gap persisting in production classifiers and multi-agent debate setups.
ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents cs.CR · 2026-05-17 · conditional · none · ref 13
Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.
AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills cs.CR · 2026-05-13 · conditional · none · ref 11
AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.
When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents cs.AI · 2026-05-09 · unverdicted · none · ref 27 · 2 links
EnvTrustBench is a new agentic benchmark that measures evidence-grounding defects where LLM agents overtrust faulty environmental observations and take incorrect actions.
LoopTrap: Termination Poisoning Attacks on LLM Agents cs.CR · 2026-05-07 · unverdicted · none · ref 52
LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.
When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems cs.CR · 2026-05-01 · unverdicted · none · ref 36
Embedding-based defenses fail against attacks that align malicious message embeddings with benign ones in LLM multi-agent systems, but token-level confidence scores improve robustness by enabling better pruning of suspicious messages.
Alignment Contracts for Agentic Security Systems cs.CR · 2026-04-30 · conditional · full · ref 48
Alignment contracts define scope, allowed effects, budgets and disclosure rules as safety properties over finite effect traces, with decidable admissibility, refinement rules, and Lean-verified soundness under an observability assumption.
Contextual Agentic Memory is a Memo, Not True Memory cs.AI · 2026-04-30 · unverdicted · none · ref 4
Agentic memory is lookup-based retrieval, not weight-based consolidation, creating a generalization ceiling on novel tasks and structural vulnerability to memory poisoning.
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents cs.CR · 2026-04-24 · unverdicted · none · ref 15
RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents cs.CL · 2025-09-26 · conditional · none · ref 16
ChatInject exploits LLM chat template structures to boost indirect prompt injection success rates on agents from ~5-15% to 32-52% across benchmarks, with multi-turn persuasion variants performing best.
Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly cs.CR · 2026-05-02 · unverdicted · none · ref 40 · 2 links
The paper measures policy-carriage failures during LLM context assembly and evaluates SafeContext as a partial mitigation on Llama, Qwen, and Mistral models.
STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems cs.AI · 2026-04-11 · unverdicted · none · ref 4
STARS fuses static priors and contextual risk scoring for agent skill invocations, achieving modest AUPRC gains on prompt injection attacks in a new SIA-Bench but concluding it supplements rather than replaces static auditing.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer