pith. sign in

arxiv: 2605.01735 · v2 · pith:LPH6T5NMnew · submitted 2026-05-03 · 💻 cs.CL

Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure

Pith reviewed 2026-07-01 00:27 UTC · model grok-4.3

classification 💻 cs.CL
keywords LLM unlearningprivacy preservationhidden state alignmentsynthetic dataselective forgettinggeometric methods
0
0 comments X

The pith

Geometric Unlearning suppresses specific LLM knowledge by projecting hidden states onto a low-rank safe subspace distilled from minimal safe prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Geometric Unlearning to let language models forget targeted entities or topics after deployment while keeping general capabilities intact. It works by first extracting a compact safe-behavior subspace from a handful of reference prompts, then aligning the model's internal representations for synthetic anchors to that subspace through targeted projection. This process avoids any need for the original training data and adds a regularizer to limit unwanted side effects on unrelated content. The approach is evaluated on privacy benchmarks where it shows effective forgetting of targets with little loss elsewhere. A sympathetic reader would care because it addresses the practical tension between privacy rules and the difficulty of accessing past training data for large models.

Core claim

Geometric Unlearning distills a compact low-rank safe-behavior subspace from a small set of safe reference prompts and performs localized projection-based alignment of prompt-conditioned hidden states onto this subspace using lightweight synthetic anchors, with a teacher-distillation regularizer on non-target anchors to limit collateral drift, achieving target suppression without access to the original training corpus.

What carries the argument

The low-rank safe-behavior subspace distilled from safe reference prompts, which carries the argument by serving as the target for localized projection alignment of hidden representations.

If this is right

  • Target suppression remains strong on the ToFU and UnlearnPII benchmarks while non-target performance stays close to the original model.
  • Unlearning succeeds using only synthetic data and no access to the original training corpus.
  • The method reduces collateral drift through the added regularizer on synthetic non-target anchors.
  • Localized projection on hidden states replaces the need for broad gradient updates or refusal tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same projection approach might extend to forgetting specific capabilities rather than just factual entities if suitable reference subspaces can be identified.
  • Deployment pipelines could integrate this form of unlearning as a lightweight post-training step triggered by new privacy requests.
  • The reliance on synthetic anchors suggests that generating high-quality non-target examples becomes a key practical variable for scaling the technique.

Load-bearing premise

That alignment to a subspace derived from a small number of safe prompts will selectively remove target information without causing broader unintended changes in the model's behavior.

What would settle it

An experiment showing that after unlearning, the model still produces the targeted private information in response to direct or indirect queries about the suppressed entity, or exhibits measurable degradation on standard non-target tasks.

Figures

Figures reproduced from arXiv: 2605.01735 by Chenchen Tan, Cunjian Chen, Longxiang Gao, Shujie Cui, Xinghao Li, Youyang Qu.

Figure 1
Figure 1. Figure 1: Conventional data-driven unlearning vs. our original￾corpus-free unlearning (GU). Top: Standard unlearning pipelines fine-tune an LLM using target unlearning data Df and retention data Dr, which can re-expose original data and pose privacy risks. Bottom: Our approach uses only user-provided anchor points A to generate synthetic unlearning data Dvirt, and applies Geometric Unlearning on the LLM using Dvirt … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed unlearning framework. The framework is structured into two parallel pathways to balance unlearning and preservation. The top Unlearning Pathway focuses on geometric unlearning by processing target anchors and synthetic unlearning data (Dvirt) through dynamic window masking. The aggregated hidden states (for topic z) are then projected within the Geometric Unlearning Engine to minim… view at source ↗
Figure 3
Figure 3. Figure 3: Privacy risk of MIAs across unlearning methods for two base models (unlearning 10% benchmark data, i.e., Forget-10) measured by the deviation from chance performance |AUC − 0.5| (lower is better). Each row contains three points computed from different MIAs scoring metrics: Min-K, Reference, and Zlib. For each metric, AUC is the ROC area obtained when using the corresponding attack score to distinguish trai… view at source ↗
Figure 4
Figure 4. Figure 4: Effect of synthetic sample budget on unlearning, re￾taining, and runtime. We construct 10 to 40 anchor-conditioned synthetic samples for unlearning, paired with an equal number of synthetic retain samples (1:1 forget and retain) for each setting. We report extraction strength (lower is better), model utility (higher is better), and training time for LLaMA-2-7B and LLaMA-3.2-1B [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 6
Figure 6. Figure 6: Training dynamics under large-scale unlearning (20% forget split) for LLaMA3.2-1B and LLaMA3.2-8B. We track ac￾curacy on the unlearning and retaining sets over training epochs. Shaded bands indicate variability across runs. 6. Conclusion We introduced Geometric Unlearning (GU), a selective un￾learning framework that operates on prompt-time planning representations without access to the original training co… view at source ↗
Figure 5
Figure 5. Figure 5: Analysis of the safe-behavior subspace and PCA rank selection on the last two layers of LLaMA2-7B. Left: the PCA explained-variance spectrum of safe-reference activations shows that the dominant safe-behavior variation is captured by a compact low-rank subspace. Right: the forgetting-utility trade-off across PCA ranks, where each point is annotated with the corresponding rank k and convergence epoch e [PI… view at source ↗
Figure 5
Figure 5. Figure 5: Unlearning effectiveness and model utility trade-off across model scales and forget splits on UnlearnPII benchmark. The y-axis reports target knowledge suppression (higher indicates better unlearning), and the x-axis reports retained model utility (higher indicates better utility preservation). Large-scale unlearning [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Unlearning effectiveness and model utility trade-off across model scales and forget splits on UnlearnPII benchmark. The y-axis reports target knowledge accuracy (close to 0 indicates better unlearning), and the x-axis reports retained model utility (higher indicates better utility preservation). suppression relative to most baselines, indicating a favor￾able balance between unlearning and utility maintenan… view at source ↗
Figure 7
Figure 7. Figure 7: Training dynamics under large-scale unlearning (20% forget split) for LLaMA3.2-1B and LLaMA3.1-8B. We track ac￾curacy on the unlearning and retaining sets over training epochs. Shaded bands indicate variability across runs. 6. Conclusion We introduced Geometric Unlearning (GU), a selective un￾learning framework that operates on prompt-conditioned hidden states without access to the original training corpus… view at source ↗
Figure 8
Figure 8. Figure 8: (b) further shows that GU scales to LLaMA-13B: the forget-set ROUGE decreases to 0.185 and 0.127 under Forget-05 and Forget-10, respectively, while the remaining-data ROUGE only moderately declines. These results indicate that GU generalizes across both architecture and scale, maintaining effective unlearning with limited degradation of non-target performance. C.5. Comparison with LUNAR We further compare … view at source ↗
read the original abstract

As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information about a particular entity or topic while preserving the LLM's general utility. However, most existing LLM unlearning methods require access to the original training corpus and rely on output-level refusal tuning or broad gradient updates, creating a tension among unlearning strength, non-target preservation, and data availability. We propose Geometric Unlearning (GU), an approach that operates directly on the model's prompt-conditioned hidden states without access to the original training corpus. Specifically, GU distills a compact, low-rank safe-behavior subspace from a small set of safe reference prompts and uses lightweight anchor-in-context synthetic prompts to trigger localized, projection-based alignment of hidden representations to this safe subspace. A teacher-distillation regularizer on synthetic non-target anchors further reduces collateral drift. Across privacy-oriented unlearning benchmarks (ToFU and UnlearnPII), GU achieves strong target suppression with minimal impact on non-target performance, demonstrating that effective unlearning can be achieved with minimal synthetic data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Geometric Unlearning (GU) for selective unlearning in LLMs without access to the original training corpus. GU distills a compact low-rank safe-behavior subspace from a small set of safe reference prompts, then applies localized projection-based alignment of hidden states triggered by lightweight anchor-in-context synthetic prompts, with a teacher-distillation regularizer on synthetic non-target anchors to limit drift. Evaluations on ToFU and UnlearnPII benchmarks claim strong target suppression with minimal non-target impact, showing effective unlearning is possible with minimal synthetic data.

Significance. If the central claims hold under scrutiny, the work would be significant for privacy-oriented LLM governance: it offers a data-minimal alternative to corpus-dependent or broad-gradient unlearning methods while preserving utility, potentially easing the tension between unlearning strength and data availability.

major comments (2)
  1. [Abstract] Abstract: The abstract provides no equations, implementation details, or quantitative results beyond high-level claims, so it is impossible to assess whether the described projection and distillation steps actually support the suppression claim.
  2. [Method] Method: The construction assumes target-specific information is linearly separable from safe behavior in hidden-state space and that the low-rank subspace distilled from limited safe prompts will isolate and erase it; no analysis or test is provided to establish this separability for entangled factual knowledge, so the subspace could capture only generic refusal patterns while leaving target facts intact under rephrasing or indirect prompting.
minor comments (1)
  1. The description of how synthetic anchors are generated and how the teacher-distillation regularizer is weighted should be expanded for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review. We respond to each major comment below, indicating planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract provides no equations, implementation details, or quantitative results beyond high-level claims, so it is impossible to assess whether the described projection and distillation steps actually support the suppression claim.

    Authors: We agree with this observation. The revised abstract will incorporate a concise description of the projection-based alignment and key quantitative results from the ToFU and UnlearnPII benchmarks to substantiate the claims. revision: yes

  2. Referee: [Method] Method: The construction assumes target-specific information is linearly separable from safe behavior in hidden-state space and that the low-rank subspace distilled from limited safe prompts will isolate and erase it; no analysis or test is provided to establish this separability for entangled factual knowledge, so the subspace could capture only generic refusal patterns while leaving target facts intact under rephrasing or indirect prompting.

    Authors: While the method is supported by strong empirical performance on the benchmarks, we acknowledge that an explicit analysis of the linear separability assumption is absent from the current manuscript. We will add such an analysis, including tests for robustness against rephrasing and indirect prompts, in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a self-contained algorithmic proposal

full rationale

The paper introduces Geometric Unlearning as a new procedure that distills a low-rank subspace from safe reference prompts and performs projection alignment on synthetic anchors, with a teacher regularizer. No equations, parameter-fitting steps, or self-citations are shown in the abstract or described claims that reduce any prediction or uniqueness result to the inputs by construction. The central claims rest on the empirical performance of the proposed algorithm rather than any definitional or self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; all details are high-level.

pith-pipeline@v0.9.1-grok · 5743 in / 893 out tokens · 29662 ms · 2026-07-01T00:27:07.694021+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning

    cs.AI 2026-06 unverdicted novelty 5.0

    NSRU constrains LoRA updates via null-space projection of retain subspaces to jointly optimize safe-target learning, undesired-response suppression, and retention in LLM unlearning.

Reference graph

Works this paper leans on

11 extracted references · cited by 1 Pith paper

  1. [1]

    the person named (ANCHOR)

    INDIRECT: refer to (ANCHOR) indirectly (e.g.,“the person named (ANCHOR)”, pronoun-based refer- ence) while still including the anchor string at least once. • Allocation rule (no choice):If N is bigger than 8, produce exactly 1 prompt per bucket for the first 8 prompts; for any remaining prompts, repeat the bucket order (BIO→FACT→. . . ) until reachingN. •...

  2. [2]

    same first name token (e.g., identical given name)

  3. [3]

    same last name token

  4. [4]

    A.”-style)

    same initials pattern (e.g., “N. A.”-style)

  5. [5]

    Coverage requirements (must satisfy all): • Name-group ratio (exact):Exactly 50% of samples must use confusable names and 50% must use unrelated names

    shared prefix of length≥4on one token. Coverage requirements (must satisfy all): • Name-group ratio (exact):Exactly 50% of samples must use confusable names and 50% must use unrelated names. IfN retis odd, allocate the extra one to unrelated names. • Per-name balance (exact):For each list, distribute samples as evenly as possible across names (difference ...

  6. [6]

    fictional two-sentence bio request

  7. [7]

    fictional role + signature project request

  8. [8]

    fictional timeline request (3 bullet points)

  9. [9]

    fictional occupation request

  10. [10]

    neutral mention inside an unrelated task (e.g.,meeting notes)

  11. [11]

    question

    short-term (30-day) plan request (3 bullet points). Answer constraints: • All answers must be clearly fictional and non-verifiable. • Answers must be short (1–3 sentences, or 3 bullets when requested). • Do not include any real-world claims, citations, or references. Output format:Output a JSON list of objects with keys{"question","answer"}. 21