Backdoor attacks for in-context learning with language models

Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, Nicholas Carlini · 2023 · arXiv 2307.14692

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

A First Look at the Security Issues in the Model Context Protocol Ecosystem

cs.CR · 2025-10-18 · conditional · novelty 8.0

Analysis of 67,057 servers across six registries reveals widespread conditions for server hijacking and metadata manipulation in MCP, with a new tool MCPInspect flagging 833 vulnerable servers and 18 with suspicious descriptions.

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

cs.CR · 2024-10-03 · unverdicted · novelty 7.0

ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

cs.CR · 2026-04-23 · unverdicted · novelty 6.0

BadStyle creates stealthy backdoors in LLMs by poisoning samples with imperceptible style triggers and using an auxiliary loss to stabilize payload injection, achieving high attack success rates across multiple models while evading defenses.

citing papers explorer

Showing 3 of 3 citing papers.

A First Look at the Security Issues in the Model Context Protocol Ecosystem cs.CR · 2025-10-18 · conditional · none · ref 18
Analysis of 67,057 servers across six registries reveals widespread conditions for server hijacking and metadata manipulation in MCP, with a new tool MCPInspect flagging 833 vulnerable servers and 18 with suspicious descriptions.
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents cs.CR · 2024-10-03 · unverdicted · none · ref 111
ASB is a new benchmark that tests 10 prompt injection attacks, memory poisoning, a novel Plan-of-Thought backdoor attack, and 11 defenses on LLM agents across 13 models, finding attack success rates up to 84.3% and limited defense effectiveness.
Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers cs.CR · 2026-04-23 · unverdicted · none · ref 27
BadStyle creates stealthy backdoors in LLMs by poisoning samples with imperceptible style triggers and using an auxiliary loss to stabilize payload injection, achieving high attack success rates across multiple models while evading defenses.

Backdoor attacks for in-context learning with language models

fields

years

verdicts

representative citing papers

citing papers explorer