Combines POS filtering and perplexity loss to generate sensible universal adversarial triggers that drop SST sentiment accuracy to 0.04-0.12, with adversarial training raising it to 0.48.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Universal Adversarial Triggers
Combines POS filtering and perplexity loss to generate sensible universal adversarial triggers that drop SST sentiment accuracy to 0.04-0.12, with adversarial training raising it to 0.48.