NSRU constrains LoRA updates via null-space projection of retain subspaces to jointly optimize safe-target learning, undesired-response suppression, and retention in LLM unlearning.
Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information about a particular entity or topic while preserving the LLM's general utility. However, most existing LLM unlearning methods require access to the original training corpus and rely on output-level refusal tuning or broad gradient updates, creating a tension among unlearning strength, non-target preservation, and data availability. We propose Geometric Unlearning (GU), an approach that operates directly on the model's prompt-conditioned hidden states without access to the original training corpus. Specifically, GU distills a compact, low-rank safe-behavior subspace from a small set of safe reference prompts and uses lightweight anchor-in-context synthetic prompts to trigger localized, projection-based alignment of hidden representations to this safe subspace. A teacher-distillation regularizer on synthetic non-target anchors further reduces collateral drift. Across privacy-oriented unlearning benchmarks (ToFU and UnlearnPII), GU achieves strong target suppression with minimal impact on non-target performance, demonstrating that effective unlearning can be achieved with minimal synthetic data.
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning
NSRU constrains LoRA updates via null-space projection of retain subspaces to jointly optimize safe-target learning, undesired-response suppression, and retention in LLM unlearning.