Sequential LLM defense deployment leads to risk exacerbation in 38.9% of cases due to anti-aligned updates in shared critical layers, addressed by conflict-guided layer freezing.
Invariance makes llm unlearning resilient even to unanticipated downstream fine-tuning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
CONDITIONAL 2representative citing papers
Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.
citing papers explorer
-
Defenses at Odds: Measuring and Explaining Defense Conflicts in Large Language Models
Sequential LLM defense deployment leads to risk exacerbation in 38.9% of cases due to anti-aligned updates in shared critical layers, addressed by conflict-guided layer freezing.
-
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.