BadStyle creates stealthy backdoors in LLMs by poisoning samples with imperceptible style triggers and using an auxiliary loss to stabilize payload injection, achieving high attack success rates across multiple models while evading defenses.
A survey on large language model (llm) security and privacy: The good, the bad, and the ugly,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers
BadStyle creates stealthy backdoors in LLMs by poisoning samples with imperceptible style triggers and using an auxiliary loss to stabilize payload injection, achieving high attack success rates across multiple models while evading defenses.