hub Canonical reference

Prompt flow integrity to prevent privilege escalation in LLM agents

Juhee Kim, Woohyuk Choi, Byoungyoung Lee · 2025 · arXiv 2503.15547

Canonical reference. 80% of citing Pith papers cite this work as background.

13 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

cs.CR · 2026-05-27 · unverdicted · novelty 8.0

Roughly 1% of real resumes contain hidden prompt injections against LLM screeners, prevalence has risen over 1-2 years, and over 90% avoid explicit instructions.

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

cs.CR · 2026-04-08 · unverdicted · novelty 8.0

TRUSTDESC prevents tool poisoning in LLM applications by automatically generating accurate tool descriptions from code via a three-stage pipeline of reachability analysis, description synthesis, and dynamic verification.

"I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents

cs.CR · 2026-05-30 · unverdicted · novelty 6.0

New benchmark Scammer4U finds 54-93% critical PII leakage from frontier web agents on scam sites versus 0% on benign twins, plus a 30-point gap between verbalized suspicion and actual submission.

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

cs.CR · 2026-05-26 · unverdicted · novelty 6.0

ChainCaps prevents permission laundering in tool-using agents by enforcing monotonic capability attenuation through budget intersection, reducing attack success from 25-68% to 0-4.8% on 82 tasks while maintaining 96-100% benign performance.

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

cs.CR · 2026-05-26 · unverdicted · novelty 6.0

AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.

Behavioral Integrity Verification for AI Agent Skills

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

cs.SE · 2026-04-16 · unverdicted · novelty 5.0

Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.

PIArena: A Platform for Prompt Injection Evaluation

cs.CR · 2026-04-09 · unverdicted · novelty 5.0

PIArena provides a unified evaluation platform for prompt injection attacks and defenses, featuring a new adaptive attack that reveals major weaknesses in existing protections.

LLM Agents Are the Antidote to Walled Gardens

cs.LG · 2025-06-30 · unverdicted · novelty 4.0

LLM agents enable universal interoperability by serving as automatic translators and adapters between proprietary digital services.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

cs.AI · 2026-05-27 · unverdicted · novelty 3.0

The Redpanda Agentic Data Plane uses out-of-band metadata channels to enforce data scoping, action constraints, and tamper-proof auditing on autonomous AI agents.

citing papers explorer

Showing 12 of 12 citing papers after filters.

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening cs.CR · 2026-05-27 · unverdicted · none · ref 16
Roughly 1% of real resumes contain hidden prompt injections against LLM screeners, prevalence has risen over 1-2 years, and over 90% avoid explicit instructions.
TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation cs.CR · 2026-04-08 · unverdicted · none · ref 43
TRUSTDESC prevents tool poisoning in LLM applications by automatically generating accurate tool descriptions from code via a three-stage pipeline of reachability analysis, description synthesis, and dynamic verification.
"I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents cs.CR · 2026-05-30 · unverdicted · none · ref 137
New benchmark Scammer4U finds 54-93% critical PII leakage from frontier web agents on scam sites versus 0% on benign twins, plus a 30-point gap between verbalized suspicion and actual submission.
ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation cs.CR · 2026-05-26 · unverdicted · none · ref 7
ChainCaps prevents permission laundering in tool-using agents by enforcing monotonic capability attenuation through budget intersection, reducing attack success from 25-68% to 0-4.8% on 82 tasks while maintaining 96-100% benign performance.
Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents cs.CR · 2026-05-26 · unverdicted · none · ref 10
AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.
Behavioral Integrity Verification for AI Agent Skills cs.CR · 2026-05-12 · unverdicted · none · ref 30
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills cs.CR · 2026-05-07 · unverdicted · none · ref 24
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills cs.CL · 2026-04-27 · unverdicted · none · ref 9
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility cs.SE · 2026-04-16 · unverdicted · none · ref 34
Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.
PIArena: A Platform for Prompt Injection Evaluation cs.CR · 2026-04-09 · unverdicted · none · ref 6
PIArena provides a unified evaluation platform for prompt injection attacks and defenses, featuring a new adaptive attack that reveals major weaknesses in existing protections.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 84
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane cs.AI · 2026-05-27 · unverdicted · none · ref 15
The Redpanda Agentic Data Plane uses out-of-band metadata channels to enforce data scoping, action constraints, and tamper-proof auditing on autonomous AI agents.

Prompt flow integrity to prevent privilege escalation in LLM agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer