SAEs detect concepts well in diffusion models but fail as direct intervention points for unlearning; a detection-guided patch replacement method yields significantly cleaner erasure results.
Machine unlearning: A survey
6 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 6representative citing papers
MemoRepair formalizes the cascade update problem in agentic memory and solves it via a min-cut reduction that eliminates invalidated memory exposure to 0% while recovering 91-94% of valid successors at 57-76% of baseline repair cost.
LLM unlearning is reframed as inadvertently installing backdoor triggers on forget-tokens; Random Noise Augmentation is introduced as a defense that improves robustness with theoretical guarantees.
CSC identifies backdoored samples via early-epoch latent clustering and conceals them by relabeling to a virtual class, driving attack success rates near zero on benchmarks with little clean accuracy loss.
CiPO removes undesired knowledge from both intermediate reasoning steps and final answers in large reasoning models by iteratively optimizing preferences toward valid counterfactual traces while keeping overall reasoning performance intact.
A LoRA-based residual feature alignment method for efficient machine unlearning on pre-trained models by targeting zero residuals on retained data and shifted residuals on unlearned data.
citing papers explorer
-
Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA
A LoRA-based residual feature alignment method for efficient machine unlearning on pre-trained models by targeting zero residuals on retained data and shifted residuals on unlearned data.