Oracle Poisoning corrupts knowledge graphs used by AI agents via tool calls, leading tested models to accept fabricated claims at 100% under directed queries in a production-scale demonstration.
ConfusedPilot: Confused deputy risks in RAG-based LLMs
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 2polarities
background 2representative citing papers
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
RAGShield detects all numerical manipulations in government RAG systems via pattern-based value extraction and cross-source verification, achieving 0% attack success rate on 430 real IRS-derived attacks where embedding defenses miss 79-90%.
Frontier AI agents introduce new confidentiality, integrity, and availability risks through changed assumptions on code-data separation and authority boundaries, requiring layered defenses like sandboxing and policy enforcement.
citing papers explorer
-
Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning
Oracle Poisoning corrupts knowledge graphs used by AI agents via tool calls, leading tested models to accept fabricated claims at 100% under directed queries in a production-scale demonstration.
-
Do Coding Agents Understand Least-Privilege Authorization?
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
-
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
-
RAGShield: Detecting Numerical Claim Manipulation in Government RAG Systems
RAGShield detects all numerical manipulations in government RAG systems via pattern-based value extraction and cross-source verification, achieving 0% attack success rate on 430 real IRS-derived attacks where embedding defenses miss 79-90%.
-
Security Considerations for Artificial Intelligence Agents
Frontier AI agents introduce new confidentiality, integrity, and availability risks through changed assumptions on code-data separation and authority boundaries, requiring layered defenses like sandboxing and policy enforcement.
- SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces