Introduces MCP-TDP benchmark showing near-100% attack success on models like GPT-4o for tool description poisoning and proposes reactive self-correction defense.
hub
The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise
12 Pith papers cite this work. Polarity classification is still indexing.
abstract
The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables remarkable capabilities in natural language processing and generation. However, these systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises. This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents, highlighting how they can be exploited as attack vectors capable of achieving computer takeovers. We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers. We demonstrate that adversaries can effectively coerce popular LLMs into autonomously installing and executing malware on victim machines. Our evaluation of 18 state-of-the-art LLMs reveals that 94.4% of models succumb to Direct Prompt Injection, and 83.3% are vulnerable to the more stealthy and evasive RAG Backdoor Attack. Notably, we tested trust boundaries within multi-agent systems, where LLM agents interact and influence each other, and we revealed that LLMs which successfully resist direct injection or RAG backdoor attacks will execute identical payloads when requested by peer agents. We found that 100.0% of tested LLMs can be compromised through Inter-Agent Trust Exploitation attacks, and that every model exhibits context-dependent security behaviors that create exploitable blind spots.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 12roles
background 3polarities
background 3representative citing papers
Trace fingerprints AI penetration testing agents from terminal command sequences to identify model families and extracts their system prompts via targeted defensive prompt injection.
Dynamic separator generation via domain-separated SHA-256 reduces attack success rate from 0.88 to 0.38 and eliminates leakage exposure in evaluations against 16 payloads on Llama and DeepSeek models.
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
A single legitimate request can cause LLM orchestrators to output plans that violate security policies through the composition of benign subtasks, bypassing subtask-level checks.
A graph-based propagation model for error cascades in LLM multi-agent systems plus a genealogy-graph governance plugin that prevents final infection in at least 89% of runs across tested frameworks.
A three-layer probabilistic assume-guarantee architecture is structurally required for safe LLM agent deployment.
Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
The paper analyzes security, privacy, and ethical risks in the OpenClaw AI agent system arising from its architecture, storage, tool use, and integrations, arguing these form major barriers to trustworthy adoption.
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.
citing papers explorer
No citing papers match the current filters.