Misusing Tools in Large Language Models With Visual Adversarial Examples

Earlence Fernandes; Niloofar Mireshghallah; Rajesh K. Gupta; Shuheng Li; Taylor Berg-Kirkpatrick; Xiaohan Fu; Zihan Wang

arxiv: 2310.03185 · v1 · pith:4VMZECKSnew · submitted 2023-10-04 · 💻 cs.CR · cs.AI

Misusing Tools in Large Language Models With Visual Adversarial Examples

Xiaohan Fu , Zihan Wang , Shuheng Li , Rajesh K. Gupta , Niloofar Mireshghallah , Taylor Berg-Kirkpatrick , Earlence Fernandes This is my paper

classification 💻 cs.CR cs.AI

keywords adversarialattacksmultipletoolsaffectattackercauseexamples

0 comments

read the original abstract

Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversations and book hotels. Different from prior work, our attacks can affect the confidentiality and integrity of user resources connected to the LLM while being stealthy and generalizable to multiple input prompts. We construct these attacks using gradient-based adversarial training and characterize performance along multiple dimensions. We find that our adversarial images can manipulate the LLM to invoke tools following real-world syntax almost always (~98%) while maintaining high similarity to clean images (~0.9 SSIM). Furthermore, using human scoring and automated metrics, we find that the attacks do not noticeably affect the conversation (and its semantics) between the user and the LLM.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
cs.CR 2024-06 unverdicted novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
Towards an AI co-scientist
cs.AI 2025-02 unverdicted novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.