Autobackdoor: Automating backdoor attacks via llm agents

Yige Li, Zhe Li, Wei Zhao, Nay Myat Min, Hanxun Huang, Xingjun Ma, Jun Sun · 2025 · arXiv 2511.16709

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

cs.CR · 2026-04-08

citing papers explorer

Showing 3 of 3 citing papers.

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences cs.CR · 2026-05-18 · unverdicted · none · ref 18
OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.
DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models cs.CR · 2026-05-15 · unverdicted · none · ref 30
DarkLLM trains an LLM to generate language-driven adversarial perturbations that unify targeted, untargeted, segmentation, and multi-model attacks on foundation models.
SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems cs.CR · 2026-04-08 · unreviewed · ref 11

Autobackdoor: Automating backdoor attacks via llm agents

fields

years

verdicts

representative citing papers

citing papers explorer