Roughly 1% of real resumes contain hidden prompt injections against LLM screeners, prevalence has risen over 1-2 years, and over 90% avoid explicit instructions.
hub Canonical reference
Prompt flow integrity to prevent privilege escalation in LLM agents
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 16representative citing papers
TRUSTDESC prevents tool poisoning in LLM applications by automatically generating accurate tool descriptions from code via a three-stage pipeline of reachability analysis, description synthesis, and dynamic verification.
The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.
SeClaw provides spec-driven synthesis of security tasks and an execution-based docker testbed for evaluating unsafe behaviors in autonomous LLM agents.
New benchmark Scammer4U finds 54-93% critical PII leakage from frontier web agents on scam sites versus 0% on benign twins, plus a 30-point gap between verbalized suspicion and actual submission.
AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.
ChainCaps uses monotonic capability attenuation via intersection of sink-specific budgets in a transparent proxy to reduce attack success on composed tool-using agents from 25-68% to 0-4.8% while keeping 96-100% benign task completion.
LLM agent security is reframed as an agent-human interaction issue, supported by a survey showing industry preference for human-centric mechanisms over academic favorites and proposing a new research agenda.
Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.
PIArena provides a unified evaluation platform for prompt injection attacks and defenses, featuring a new adaptive attack that reveals major weaknesses in existing protections.
LLM agents enable universal interoperability by serving as automatic translators and adapters between proprietary digital services.
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
The Redpanda Agentic Data Plane uses out-of-band metadata channels to enforce data scoping, action constraints, and tamper-proof auditing on autonomous AI agents.
citing papers explorer
No citing papers match the current filters.