Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure

· 2026 · cs.CL · arXiv 2605.01735

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information about a particular entity or topic while preserving the LLM's general utility. However, most existing LLM unlearning methods require access to the original training corpus and rely on output-level refusal tuning or broad gradient updates, creating a tension among unlearning strength, non-target preservation, and data availability. We propose Geometric Unlearning (GU), an approach that operates directly on the model's prompt-conditioned hidden states without access to the original training corpus. Specifically, GU distills a compact, low-rank safe-behavior subspace from a small set of safe reference prompts and uses lightweight anchor-in-context synthetic prompts to trigger localized, projection-based alignment of hidden representations to this safe subspace. A teacher-distillation regularizer on synthetic non-target anchors further reduces collateral drift. Across privacy-oriented unlearning benchmarks (ToFU and UnlearnPII), GU achieves strong target suppression with minimal impact on non-target performance, demonstrating that effective unlearning can be achieved with minimal synthetic data.

representative citing papers

Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning

cs.AI · 2026-06-09 · unverdicted · novelty 5.0

NSRU constrains LoRA updates via null-space projection of retain subspaces to jointly optimize safe-target learning, undesired-response suppression, and retention in LLM unlearning.

citing papers explorer

Showing 1 of 1 citing paper.

Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning cs.AI · 2026-06-09 · unverdicted · none · ref 42 · internal anchor
NSRU constrains LoRA updates via null-space projection of retain subspaces to jointly optimize safe-target learning, undesired-response suppression, and retention in LLM unlearning.

Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure

fields

years

verdicts

representative citing papers

citing papers explorer