Introduces Unlearning Depth Score (UDS) via activation patching to quantify LLM unlearning depth and claims it outperforms 20 other metrics in faithfulness and robustness on 150 models.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Measuring the Depth of LLM Unlearning via Activation Patching
Introduces Unlearning Depth Score (UDS) via activation patching to quantify LLM unlearning depth and claims it outperforms 20 other metrics in faithfulness and robustness on 150 models.