SLIP combines a soft label mechanism with key-extraction-guided CoT to reduce instruction backdoor attack success rate to 25.13% and raise clean accuracy to 87.15% in LLM agents.
ChatGPT as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs
SLIP combines a soft label mechanism with key-extraction-guided CoT to reduce instruction backdoor attack success rate to 25.13% and raise clean accuracy to 87.15% in LLM agents.