SLIP combines a soft label mechanism with key-extraction-guided CoT to reduce instruction backdoor attack success rate to 25.13% and raise clean accuracy to 87.15% in LLM agents.
Backdoorllm: A comprehensive benchmark for backdoor attacks on large language models, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs
SLIP combines a soft label mechanism with key-extraction-guided CoT to reduce instruction backdoor attack success rate to 25.13% and raise clean accuracy to 87.15% in LLM agents.