GUARD-IT performs machine unlearning in LLMs via input-dependent activation steering at inference time, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.
Shen and Xinchi Qiu and Meghdad Kurmanji and Alex Iacob and Lorenzo Sani and Yihong Chen and Nicola Cancedda and Nicholas D
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
MPU is a framework that achieves privacy-preserving unlearning for LLMs by distributing perturbed model copies for local client-side unlearning followed by server-side aggregation with harmonic denoising.
SEAT preserves epistemic abstention in LLMs during knowledge adaptation via sparse tuning and entity-perturbed KL regularization, yielding 18-101% better abstention on unknown queries while retaining near-perfect knowledge acquisition.
BARRIER applies interval arithmetic to SVD-based activation projections to create bounded forget regions that enable aggressive unlearning while providing formal protection for retain distributions via tail bounds on functional drift.
citing papers explorer
-
Inference-Time Machine Unlearning via Gated Activation Redirection
GUARD-IT performs machine unlearning in LLMs via input-dependent activation steering at inference time, matching or exceeding gradient-based baselines on TOFU and MUSE while preserving utility and working under quantization.
-
MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models
MPU is a framework that achieves privacy-preserving unlearning for LLMs by distributing perturbed model copies for local client-side unlearning followed by server-side aggregation with harmonic denoising.
-
SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention
SEAT preserves epistemic abstention in LLMs during knowledge adaptation via sparse tuning and entity-perturbed KL regularization, yielding 18-101% better abstention on unknown queries while retaining near-perfect knowledge acquisition.
-
BARRIER: Bounded Activation Regions for Robust Information Erasure
BARRIER applies interval arithmetic to SVD-based activation projections to create bounded forget regions that enable aggressive unlearning while providing formal protection for retain distributions via tail bounds on functional drift.