LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.
Following previous works (Li et al., 2024), we update three layers of parameters {l, l−1, l−2} of the model for memory efficiency
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Improving LLM Unlearning Robustness via Random Perturbations
LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.