hub

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

Lupinacci, M · 2025 · cs.CR · arXiv 2507.06850

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open full Pith review browse 12 citing papers arXiv PDF

abstract

The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables remarkable capabilities in natural language processing and generation. However, these systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises. This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents, highlighting how they can be exploited as attack vectors capable of achieving computer takeovers. We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers. We demonstrate that adversaries can effectively coerce popular LLMs into autonomously installing and executing malware on victim machines. Our evaluation of 18 state-of-the-art LLMs reveals that 94.4% of models succumb to Direct Prompt Injection, and 83.3% are vulnerable to the more stealthy and evasive RAG Backdoor Attack. Notably, we tested trust boundaries within multi-agent systems, where LLM agents interact and influence each other, and we revealed that LLMs which successfully resist direct injection or RAG backdoor attacks will execute identical payloads when requested by peer agents. We found that 100.0% of tested LLMs can be compromised through Inter-Agent Trust Exploitation attacks, and that every model exhibits context-dependent security behaviors that create exploitable blind spots.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents

cs.CR · 2026-05-22 · unverdicted · novelty 7.0

Introduces MCP-TDP benchmark showing near-100% attack success on models like GPT-4o for tool description poisoning and proposes reactive self-correction defense.

Trace: Unmasking AI Attack Agents Through Terminal Behavior Fingerprinting

cs.CR · 2026-05-02 · unverdicted · novelty 7.0

Trace fingerprints AI penetration testing agents from terminal command sequences to identify model families and extracts their system prompts via targeted defensive prompt injection.

Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks

cs.CR · 2026-05-28 · unverdicted · novelty 6.0

Dynamic separator generation via domain-separated SHA-256 reduces attack success rate from 0.88 to 0.38 and eliminates leakage exposure in evaluations against 16 payloads on Llama and DeepSeek models.

Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.

Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines

cs.CR · 2026-04-08 · unverdicted · novelty 6.0

A single legitimate request can cause LLM orchestrators to output plans that violate security policies through the composition of benign subtasks, bypassing subtask-level checks.

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration

cs.MA · 2026-03-04 · unverdicted · novelty 6.0

A graph-based propagation model for error cascades in LLM multi-agent systems plus a genealogy-graph governance plugin that prevents final infection in at least 89% of runs across tested frameworks.

Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

A three-layer probabilistic assume-guarantee architecture is structurally required for safe LLM agent deployment.

The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

cs.CR · 2026-05-12 · unverdicted · novelty 5.0

Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

Security, Privacy, and Ethical Risks in OpenClaw

cs.CR · 2026-05-22 · unverdicted · novelty 3.0

The paper analyzes security, privacy, and ethical risks in the OpenClaw AI agent system arising from its architecture, storage, tool use, and integrations, arguing these form major barriers to trustworthy adoption.

From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI

cs.CR · 2026-05-15 · unverdicted · novelty 3.0

The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.

citing papers explorer

Showing 12 of 12 citing papers.

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents cs.CR · 2026-05-22 · unverdicted · none · ref 21 · internal anchor
Introduces MCP-TDP benchmark showing near-100% attack success on models like GPT-4o for tool description poisoning and proposes reactive self-correction defense.
Trace: Unmasking AI Attack Agents Through Terminal Behavior Fingerprinting cs.CR · 2026-05-02 · unverdicted · none · ref 14 · internal anchor
Trace fingerprints AI penetration testing agents from terminal command sequences to identify model families and extracts their system prompts via targeted defensive prompt injection.
Strengthening Polymorphic Prompt Assembling: Dynamic Separator Generation Against Emerging Prompt Injection Attacks cs.CR · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
Dynamic separator generation via domain-separated SHA-256 reduces attack success rate from 0.88 to 0.38 and eliminates leakage exposure in evaluations against 16 payloads on Llama and DeepSeek models.
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents cs.AI · 2026-05-18 · unverdicted · none · ref 17 · internal anchor
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks cs.CR · 2026-05-08 · unverdicted · none · ref 49 · internal anchor
Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines cs.CR · 2026-04-08 · unverdicted · none · ref 4 · internal anchor
A single legitimate request can cause LLM orchestrators to output plans that violate security policies through the composition of benign subtasks, bypassing subtask-level checks.
From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration cs.MA · 2026-03-04 · unverdicted · none · ref 33 · internal anchor
A graph-based propagation model for error cascades in LLM multi-agent systems plus a genealogy-graph governance plugin that prevents final infection in at least 89% of runs across tested frameworks.
Position: A Three-Layer Probabilistic Assume-Guarantee Architecture Is Structurally Required for Safe LLM Agent Deployment cs.AI · 2026-05-18 · unverdicted · none · ref 36 · internal anchor
A three-layer probabilistic assume-guarantee architecture is structurally required for safe LLM agent deployment.
The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems cs.CR · 2026-05-12 · unverdicted · none · ref 19 · internal anchor
Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 40 · internal anchor
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
Security, Privacy, and Ethical Risks in OpenClaw cs.CR · 2026-05-22 · unverdicted · none · ref 8 · internal anchor
The paper analyzes security, privacy, and ethical risks in the OpenClaw AI agent system arising from its architecture, storage, tool use, and integrations, arguing these form major barriers to trustworthy adoption.
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI cs.CR · 2026-05-15 · unverdicted · none · ref 76 · internal anchor
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institutional coordination not yet in place.

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer