Do unlearning methods remove information from language model weights?arXiv preprint arXiv:2410.08827

Aghyad Deeb, Fabien Roger · 2024 · arXiv 2410.08827

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Efficient Unlearning through Maximizing Relearning Convergence Delay

cs.LG · 2026-04-10 · unverdicted · novelty 7.0

The Influence Eliminating Unlearning framework maximizes relearning convergence delay via weight decay and noise injection to remove the influence of a forgetting set while preserving accuracy on retained data.

Is your algorithm unlearning or untraining?

cs.LG · 2026-04-09 · conditional · novelty 7.0

Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).

Improving LLM Unlearning Robustness via Random Perturbations

cs.CL · 2025-01-31 · unverdicted · novelty 7.0

LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Targeting minor components in LLM representations during unlearning yields substantially better resistance to relearning attacks than prior methods.

WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

WIN-U delivers a retain-free unlearning update that approximates the gold-standard retrained model via a Woodbury-informed Newton step using only forget-set curvature information.

Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning

cs.LG · 2025-10-01 · conditional · novelty 6.0

Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.

citing papers explorer

Showing 6 of 6 citing papers.

Efficient Unlearning through Maximizing Relearning Convergence Delay cs.LG · 2026-04-10 · unverdicted · none · ref 11
The Influence Eliminating Unlearning framework maximizes relearning convergence delay via weight decay and noise injection to remove the influence of a forgetting set while preserving accuracy on retained data.
Is your algorithm unlearning or untraining? cs.LG · 2026-04-09 · conditional · none · ref 9
Machine unlearning conflates reversing the influence of specific training examples (untraining) with removing the full underlying distribution or behavior (unlearning).
Improving LLM Unlearning Robustness via Random Perturbations cs.CL · 2025-01-31 · unverdicted · none · ref 7
LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.
Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter cs.CL · 2026-05-12 · unverdicted · none · ref 2
Targeting minor components in LLM representations during unlearning yields substantially better resistance to relearning attacks than prior methods.
WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework cs.LG · 2026-04-15 · unverdicted · none · ref 2
WIN-U delivers a retain-free unlearning update that approximates the gold-standard retrained model via a Woodbury-informed Newton step using only forget-set curvature information.
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning cs.LG · 2025-10-01 · conditional · none · ref 43
Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.

Do unlearning methods remove information from language model weights?arXiv preprint arXiv:2410.08827

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer