Introduces Unlearning Depth Score (UDS) via activation patching to quantify LLM unlearning depth and claims it outperforms 20 other metrics in faithfulness and robustness on 150 models.
Intrinsic Test of Unlearning Using Parametric Knowledge Traces
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.
EMBER augments existing erasure methods by precisely removing concept features from embeddings via sparse matrix factorization, cutting relearning recovery to 35% on Llama-3.1-8B from 70-76%.
citing papers explorer
-
Measuring the Depth of LLM Unlearning via Activation Patching
Introduces Unlearning Depth Score (UDS) via activation patching to quantify LLM unlearning depth and claims it outperforms 20 other metrics in faithfulness and robustness on 150 models.
-
LMs as Task-Specific Knowledge Bases: An Interpretability Analysis
LMs store facts in task-specific parameter subsets, shown by inconsistent emergence across tasks during training and distinct localized parameters for the same fact.
-
Don't Forget Your Embeddings: Robust Knowledge Erasure via Precise Editing of Embeddings
EMBER augments existing erasure methods by precisely removing concept features from embeddings via sparse matrix factorization, cutting relearning recovery to 35% on Llama-3.1-8B from 70-76%.