Maximizing Local Entropy Where It Matters: Prefix-Aware Localized LLM Unlearning

Naixin Zhai , Pengyang Shao , Binbin Zheng , Yonghui Yang , Fei Shen , Long Bai , Xun Yang

Authors on Pith no claims yet

classification 💻 cs.CL

keywords paluunlearningutilityvocabularyacrossentropygenerallocal

read the original abstract

Machine unlearning aims to forget sensitive knowledge from Large Language Models (LLMs) while maintaining general utility. However, existing approaches typically treat all tokens in a response indiscriminately and enforce uncertainty over the entire vocabulary. This global treatment results in unnecessary utility degradation and extends optimization to content-agnostic regions. To address these limitations, we propose PALU (Prefix-Aware Localized Unlearning), a framework driven by a local entropy maximization objective across both temporal and vocabulary dimensions. PALU reveals that (i) suppressing the sensitive prefix alone is sufficient to sever the causal generation link, and (ii) flattening only the top-$k$ logits is adequate to maximize uncertainty in the critical subspace. These findings allow PALU to alleviate redundant optimization across the full vocabulary and parameter space while minimizing collateral damage to general model performance. Extensive experiments validate that PALU achieves superior forgetting efficacy and utility preservation compared to state-of-the-art baselines. Our code is available at https://github.com/nxZhai/PALU.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mitigating Error Amplification in Fast Adversarial Training
cs.LG 2026-04 unverdicted novelty 6.0

DDG dynamically adjusts perturbation magnitude and supervision strength in fast adversarial training according to sample confidence at the ground-truth class, mitigating catastrophic overfitting and the robustness-acc...
VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
cs.LG 2026-03 unverdicted novelty 6.0

VC-Soup uses a cosine-similarity consistency metric to filter data, trains value-consistent policies, and applies linear merging with Pareto filtering to improve multi-value LLM alignment trade-offs.