pith. machine review for the scientific record. sign in

arxiv: 2605.01735 · v1 · submitted 2026-05-03 · 💻 cs.CL

Recognition: unknown

Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:58 UTC · model grok-4.3

classification 💻 cs.CL
keywords geometric unlearningLLM privacysynthetic dataToFU benchmarkUnlearnPIIlow-rank geometryhidden state alignmenttargeted forgetting
0
0 comments X

The pith

Geometric Unlearning lets LLMs forget specific private facts using only a handful of synthetic prompts while retaining general performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models deployed in practice must sometimes remove particular pieces of information to satisfy privacy or legal requirements after initial training. Existing unlearning techniques typically demand the full original training data and apply broad changes that can degrade overall capabilities. This work shows that a compact low-rank representation of safe behavior can be extracted from a small collection of reference prompts and then used to realign the model's internal hidden states at inference time. Alignment occurs through projection onto this safe geometry guided by lightweight synthetic anchor prompts, with an added regularizer to limit unintended shifts. The result on standard privacy benchmarks is targeted forgetting accompanied by little loss in non-target tasks, suggesting that unlearning need not require massive data or heavy retraining.

Core claim

The paper establishes that Geometric Unlearning operates directly on prompt-time planning states by first distilling a low-rank geometry of desired safe behavior from a small set of safe reference prompts, then applying projection-based alignment of hidden representations using synthetic in-context anchors, together with a teacher-distillation regularizer on non-target anchors, to suppress target information without access to the original training corpus.

What carries the argument

Geometric Unlearning (GU): extraction of a compact low-rank safe-behavior geometry from reference prompts followed by projection alignment of hidden planning states via synthetic anchors.

If this is right

  • Strong suppression of target entities is achieved on ToFU and UnlearnPII benchmarks without original training data.
  • Non-target performance remains largely intact when alignment uses only minimal synthetic prompts.
  • Localized projection on hidden states avoids the broad gradient updates common in prior methods.
  • A teacher-distillation regularizer on synthetic non-target anchors limits collateral drift during unlearning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same low-rank alignment idea could be tested on unlearning tasks in non-language models such as vision or multimodal systems.
  • Organizations handling regulated data might adopt this approach to meet deletion requests without maintaining full training archives.
  • If the safe geometry remains stable across model scales, the method could support repeated unlearning cycles on the same base model.

Load-bearing premise

A low-rank geometry distilled from a few safe prompts can be projected onto hidden states to suppress chosen target information without broad utility loss or access to the original data.

What would settle it

Run the method on a model in which target facts are deliberately entangled across many dimensions in the hidden states; if target suppression fails or non-target accuracy drops sharply, the geometric alignment approach does not hold.

Figures

Figures reproduced from arXiv: 2605.01735 by Chenchen Tan, Cunjian Chen, Longxiang Gao, Shujie Cui, Xinghao Li, Youyang Qu.

Figure 1
Figure 1. Figure 1: Conventional data-driven unlearning vs. our original￾corpus-free unlearning (GU). Top: Standard unlearning pipelines fine-tune an LLM using target unlearning data Df and retention data Dr, which can re-expose original data and pose privacy risks. Bottom: Our approach uses only user-provided anchor points A to generate synthetic unlearning data Dvirt, and applies Geometric Unlearning on the LLM using Dvirt … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed unlearning framework. The framework is structured into two parallel pathways to balance unlearning and preservation. The top Unlearning Pathway focuses on geometric unlearning by processing target anchors and synthetic unlearning data (Dvirt) through dynamic window masking. The aggregated hidden states (for topic z) are then projected within the Geometric Unlearning Engine to minim… view at source ↗
Figure 3
Figure 3. Figure 3: Privacy risk of MIAs across unlearning methods for two base models (unlearning 10% benchmark data, i.e., Forget-10) measured by the deviation from chance performance |AUC − 0.5| (lower is better). Each row contains three points computed from different MIAs scoring metrics: Min-K, Reference, and Zlib. For each metric, AUC is the ROC area obtained when using the corresponding attack score to distinguish trai… view at source ↗
Figure 4
Figure 4. Figure 4: Effect of synthetic sample budget on unlearning, re￾taining, and runtime. We construct 10 to 40 anchor-conditioned synthetic samples for unlearning, paired with an equal number of synthetic retain samples (1:1 forget and retain) for each setting. We report extraction strength (lower is better), model utility (higher is better), and training time for LLaMA-2-7B and LLaMA-3.2-1B [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 6
Figure 6. Figure 6: Training dynamics under large-scale unlearning (20% forget split) for LLaMA3.2-1B and LLaMA3.2-8B. We track ac￾curacy on the unlearning and retaining sets over training epochs. Shaded bands indicate variability across runs. 6. Conclusion We introduced Geometric Unlearning (GU), a selective un￾learning framework that operates on prompt-time planning representations without access to the original training co… view at source ↗
Figure 5
Figure 5. Figure 5: Unlearning effectiveness and model utility trade-off across model scales and forget splits on UnlearnPII benchmark. The y-axis reports target knowledge suppression (higher indicates better unlearning), and the x-axis reports retained model utility (higher indicates better utility preservation). Large-scale unlearning [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information about a particular entity or topic while preserving the LLM's general utility. However, most existing LLM unlearning methods require access to the original training corpus and rely on output-level refusal tuning or broad gradient updates, creating a tension among unlearning strength, non-target preservation, and data availability. We propose Geometric Unlearning (GU), an approach that operates directly on the model's prompt-time planning states without access to the original training corpus. GU distills a compact, low-rank geometry of desired safe behavior from a small set of safe reference prompts, and uses lightweight anchor-in-context synthetic prompts to trigger localized, projection-based alignment of hidden planning representations to this safe geometry. A teacher-distillation regularizer on synthetic non-target anchors further reduces collateral drift. Across privacy-oriented unlearning benchmarks (ToFU and UnlearnPII), GU achieves strong target suppression with minimal impact on non-target performance, demonstrating that effective unlearning can be achieved with minimal synthetic data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Geometric Unlearning (GU) for selective unlearning in LLMs. GU distills a compact low-rank geometry of safe behavior from a small set of safe reference prompts and performs projection-based alignment of hidden planning states using lightweight anchor-in-context synthetic prompts, plus a teacher-distillation regularizer on non-target anchors. The method requires no access to the original training corpus. On the ToFU and UnlearnPII privacy benchmarks, GU is reported to achieve strong target suppression while preserving non-target performance, using only minimal synthetic data.

Significance. If the central geometric alignment mechanism is shown to reliably remove target encodings from hidden states, the work would be significant for practical LLM governance. It reduces reliance on original data and broad updates, offering a data-efficient alternative to existing unlearning techniques. The emphasis on low-rank distillation and synthetic anchors could influence future privacy-preserving methods, provided the approach generalizes beyond output-level metrics.

major comments (2)
  1. [§3] §3 (Geometric Unlearning procedure): The core claim that projection onto the distilled low-rank safe geometry suppresses target information encoded during original training is load-bearing, yet the manuscript provides no hidden-state probing, membership-inference, or subspace analysis to confirm that target signals are removed rather than merely masked at the output level. Without such verification, residual encodings in orthogonal subspaces cannot be ruled out.
  2. [§4] §4 (Benchmark evaluation): Results on ToFU and UnlearnPII report strong target suppression with minimal non-target degradation, but the evaluation relies on output accuracy and refusal metrics. No ablation isolating the contribution of the low-rank projection versus the synthetic anchors or regularizer is presented, making it difficult to attribute success specifically to the geometric component.
minor comments (2)
  1. [§3.1] The notation for the projection operator and the rank parameter in the low-rank geometry distillation should be defined more explicitly, ideally with a small illustrative equation.
  2. [Figure 1] Figure captions for the method overview diagram could more clearly label the flow from safe prompts to anchor alignment and the role of the teacher regularizer.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and will incorporate revisions to strengthen the empirical support for the geometric mechanism.

read point-by-point responses
  1. Referee: [§3] §3 (Geometric Unlearning procedure): The core claim that projection onto the distilled low-rank safe geometry suppresses target information encoded during original training is load-bearing, yet the manuscript provides no hidden-state probing, membership-inference, or subspace analysis to confirm that target signals are removed rather than merely masked at the output level. Without such verification, residual encodings in orthogonal subspaces cannot be ruled out.

    Authors: We agree that direct verification of hidden-state suppression is important for substantiating the central mechanism. While the method explicitly aligns planning states via projection and the output-level results on ToFU and UnlearnPII demonstrate effective target suppression with preserved utility, we acknowledge the absence of internal analysis. In the revision we will add hidden-state probing, membership-inference attacks on the target subspace, and before/after subspace overlap metrics to show that target encodings are reduced rather than merely masked at the output. revision: yes

  2. Referee: [§4] §4 (Benchmark evaluation): Results on ToFU and UnlearnPII report strong target suppression with minimal non-target degradation, but the evaluation relies on output accuracy and refusal metrics. No ablation isolating the contribution of the low-rank projection versus the synthetic anchors or regularizer is presented, making it difficult to attribute success specifically to the geometric component.

    Authors: We thank the referee for highlighting the need for component-wise attribution. The current results show the full pipeline works with minimal data, but we agree that isolating the low-rank projection is necessary. In the revised manuscript we will include ablations that remove or replace the projection step (while retaining anchors and regularizer) and report the resulting changes in target suppression and non-target performance on both benchmarks. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper proposes Geometric Unlearning as a new method that distills a low-rank safe geometry from a small set of reference prompts and performs projection-based alignment on hidden states using synthetic anchors, with a teacher-distillation regularizer. All load-bearing steps (geometry distillation, projection alignment, and regularizer) are defined from first principles and external synthetic data rather than fitted to target outcomes or reduced to self-citations. Empirical results on ToFU and UnlearnPII are independent external benchmarks, not constructed by definition from the method inputs. No self-definitional, fitted-prediction, or uniqueness-imported circularity is present in the described chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper introduces novel constructs for unlearning that rest on assumptions about model internals and the efficacy of geometric operations, with no independent evidence provided in the abstract for these entities.

axioms (1)
  • domain assumption The internal hidden states of LLMs during prompt processing contain planning representations that can be aligned geometrically to achieve unlearning.
    This underpins the operation on prompt-time planning states without training data.
invented entities (2)
  • low-rank geometry of desired safe behavior no independent evidence
    purpose: Compact representation of safe behavior distilled from reference prompts for alignment.
    Introduced as the core of the GU method.
  • anchor-in-context synthetic prompts no independent evidence
    purpose: Lightweight prompts to trigger localized projection-based alignment.
    Used to minimize data disclosure.

pith-pipeline@v0.9.0 · 5514 in / 1295 out tokens · 79405 ms · 2026-05-10T15:58:11.548449+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    Nature Machine Intelligence , pages=

    Rethinking machine unlearning for large language models , author=. Nature Machine Intelligence , pages=. 2025 , publisher=

  2. [2]

    Chongyu Fan and Jinghan Jia and Yihua Zhang and Anil Ramakrishna and Mingyi Hong and Sijia Liu , booktitle=. Towards

  3. [3]

    Forty-second International Conference on Machine Learning , year=

    A Certified Unlearning Approach without Access to Source Data , author=. Forty-second International Conference on Machine Learning , year=

  4. [4]

    Proceedings of the 31st International Conference on Computational Linguistics , pages=

    Alternate preference optimization for unlearning factual knowledge in large language models , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=

  5. [5]

    The Twelfth International Conference on Learning Representations , year=

    Detecting Pretraining Data from Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

  6. [6]

    30th USENIX security symposium (USENIX Security 21) , pages=

    Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=

  7. [7]

    arXiv preprint arXiv:2407.06460 , year=

    Muse: Machine unlearning six-way evaluation for language models , author=. arXiv preprint arXiv:2407.06460 , year=

  8. [8]

    Advances in Neural Information Processing Systems , volume=

    Memorization without overfitting: Analyzing the training dynamics of large language models , author=. Advances in Neural Information Processing Systems , volume=

  9. [9]

    Proceedings of the Natural Legal Language Processing Workshop 2025 , pages=

    Machine Unlearning of Personally Identifiable Information in Large Language Models , author=. Proceedings of the Natural Legal Language Processing Workshop 2025 , pages=

  10. [10]

    A practical guide, 1st ed., Cham: Springer International Publishing , volume=

    The eu general data protection regulation (gdpr) , author=. A practical guide, 1st ed., Cham: Springer International Publishing , volume=. 2017 , publisher=

  11. [11]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Towards Source-Free Machine Unlearning , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  12. [12]

    Zhang and Nicolas Papernot , booktitle=

    Andrei Ioan Muresanu and Anvith Thudi and Michael R. Zhang and Nicolas Papernot , booktitle=. Fast Exact Unlearning for In-Context Learning Data for

  13. [13]

    Rethinking

    Qizhou Wang and Jin Peng Zhou and Zhanke Zhou and Saebyeol Shin and Bo Han and Kilian Q Weinberger , booktitle=. Rethinking

  14. [14]

    Proceedings of the 31st International Conference on Computational Linguistics , pages=

    Unveiling entity-level unlearning for large language models: A comprehensive analysis , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=

  15. [15]

    Proceedings of the 41st International Conference on Machine Learning , pages=

    The WMDP benchmark: measuring and reducing malicious use with unlearning , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

  16. [16]

    Not every token needs forgetting: Selective un- learning to limit change in utility in large language model unlearning.arXiv preprint arXiv:2506.00876,

    Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning , author=. arXiv preprint arXiv:2506.00876 , year=

  17. [17]

    Towards Safer Large Language Models through Machine Unlearning

    Liu, Zheyuan and Dou, Guangyao and Tan, Zhaoxuan and Tian, Yijun and Jiang, Meng. Towards Safer Large Language Models through Machine Unlearning. Findings of the Association for Computational Linguistics: ACL 2024. 2024

  18. [18]

    Advances in Neural Information Processing Systems , volume=

    Large language model unlearning via embedding-corrupted prompts , author=. Advances in Neural Information Processing Systems , volume=

  19. [19]

    arXiv preprint arXiv:2506.13181 , year=

    Align-then-Unlearn: Embedding Alignment for LLM Unlearning , author=. arXiv preprint arXiv:2506.13181 , year=

  20. [20]

    Towards Robust and Parameter-Efficient Knowledge Unlearning for

    Sungmin Cha and Sungjun Cho and Dasol Hwang and Moontae Lee , booktitle=. Towards Robust and Parameter-Efficient Knowledge Unlearning for

  21. [21]

    Advances in Neural Information Processing Systems , volume=

    Large language model unlearning , author=. Advances in Neural Information Processing Systems , volume=

  22. [22]

    First Conference on Language Modeling , year=

    Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning , author=. First Conference on Language Modeling , year=

  23. [23]

    Knowledge Unlearning for Mitigating Privacy Risks in Language Models

    Jang, Joel and Yoon, Dongkeun and Yang, Sohee and Cha, Sungmin and Lee, Moontae and Logeswaran, Lajanugen and Seo, Minjoon. Knowledge Unlearning for Mitigating Privacy Risks in Language Models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023

  24. [24]

    Shen and Xinchi Qiu and Meghdad Kurmanji and Alex Iacob and Lorenzo Sani and Yihong Chen and Nicola Cancedda and Nicholas D

    William F. Shen and Xinchi Qiu and Meghdad Kurmanji and Alex Iacob and Lorenzo Sani and Yihong Chen and Nicola Cancedda and Nicholas D. Lane , booktitle=

  25. [25]

    Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection , author=

  26. [26]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    On Effects of Steering Latent Representation for Large Language Model Unlearning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  27. [27]

    Llm unlearning via loss adjustment with only forget data,

    Llm unlearning via loss adjustment with only forget data , author=. arXiv preprint arXiv:2410.11143 , year=

  28. [28]

    Emergent Response Planning in

    Zhichen Dong and Zhanhui Zhou and Zhixuan Liu and Chao Yang and Chaochao Lu , booktitle=. Emergent Response Planning in

  29. [29]

    Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

    Pal, Koyena and Sun, Jiuding and Yuan, Andrew and Wallace, Byron and Bau, David. Future Lens: Anticipating Subsequent Tokens from a Single Hidden State. Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL). 2023

  30. [30]

    Advances in Neural Information Processing Systems , volume=

    Refusal in language models is mediated by a single direction , author=. Advances in Neural Information Processing Systems , volume=

  31. [31]

    Juyeon Heo and Christina Heinze-Deml and Oussama Elachqar and Kwan Ho Ryan Chan and Shirley You Ren and Andrew Miller and Udhyakumar Nallasamy and Jaya Narain , booktitle=. Do

  32. [32]

    The Twelfth International Conference on Learning Representations , year=

    Linearity of Relation Decoding in Transformer Language Models , author=. The Twelfth International Conference on Learning Representations , year=

  33. [33]

    Pratyush Maini and Zhili Feng and Avi Schwarzschild and Zachary Chase Lipton and J Zico Kolter , booktitle=

  34. [34]

    Nathaniel Li and Alexander Pan and Anjali Gopal and Summer Yue and Daniel Berrios and Alice Gatti and Justin D. Li and Ann-Kathrin Dombrowski and Shashwat Goel and Gabriel Mukobi and Nathan Helm-Burger and Rassin Lababidi and Lennart Justen and Andrew Bo Liu and Michael Chen and Isabelle Barrass and Oliver Zhang and Xiaoyuan Zhu and Rishub Tamirisa and Bh...

  35. [35]

    UNDIAL : Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models

    Dong, Yijiang River and Lin, Hongzhou and Belkin, Mikhail and Huerta, Ramon and Vuli \'c , Ivan. UNDIAL : Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: L...

  36. [36]

    OpenUnlearning: Accelerating

    Vineeth Dorna and Anmol Reddy Mekala and Wenlong Zhao and Andrew McCallum and J Zico Kolter and Zachary Chase Lipton and Pratyush Maini , booktitle=. OpenUnlearning: Accelerating

  37. [37]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Language Models Can Predict Their Own Behavior , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  38. [38]

    Steering Language Models With Activation Engineering

    Steering language models with activation engineering , author=. arXiv preprint arXiv:2308.10248 , year=

  39. [39]

    Activation addition: Steering language models without optimization , author=

  40. [40]

    Refusal in llms is an affine function

    Refusal in llms is an affine function , author=. arXiv preprint arXiv:2411.09003 , year=

  41. [41]

    The Thirteenth International Conference on Learning Representations , year=

    Programming Refusal with Conditional Activation Steering , author=. The Thirteenth International Conference on Learning Representations , year=

  42. [42]

    arXiv preprint arXiv:2507.09709 , year=

    Large language models encode semantics in low-dimensional linear subspaces , author=. arXiv preprint arXiv:2507.09709 , year=

  43. [43]

    International Conference on Machine Learning , pages=

    The Linear Representation Hypothesis and the Geometry of Large Language Models , author=. International Conference on Machine Learning , pages=. 2024 , organization=

  44. [44]

    The Low-Dimensional Linear Geometry of Contextualized Word Representations

    Hernandez, Evan and Andreas, Jacob. The Low-Dimensional Linear Geometry of Contextualized Word Representations. Proceedings of the 25th Conference on Computational Natural Language Learning. 2021

  45. [45]

    Beyond Single Concept Vector: Modeling Concept Subspace in

    Haiyan Zhao and Heng Zhao and Bo Shen and Ali Payani and Fan Yang and Mengnan Du , booktitle=. Beyond Single Concept Vector: Modeling Concept Subspace in

  46. [46]

    Simplicity Prevails: Rethinking Negative Preference Optimization for

    Chongyu Fan and Jiancheng Liu and Licong Lin and Jinghan Jia and Ruiqi Zhang and Song Mei and Sijia Liu , booktitle=. Simplicity Prevails: Rethinking Negative Preference Optimization for

  47. [47]

    Advances in Neural Information Processing Systems , volume=

    Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference , author=. Advances in Neural Information Processing Systems , volume=

  48. [48]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

    Unlearn What You Want to Forget: Efficient Unlearning for LLMs , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

  49. [49]

    International Conference on Machine Learning , pages=

    In-Context Unlearning: Language Models as Few-Shot Unlearners , author=. International Conference on Machine Learning , pages=. 2024 , organization=

  50. [50]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  51. [51]

    The Thirteenth International Conference on Learning Representations , year=

    On Large Language Model Continual Unlearning , author=. The Thirteenth International Conference on Learning Representations , year=