pith. sign in

arxiv: 2603.12275 · v1 · submitted 2026-02-21 · 💻 cs.CL · cs.LG

GONE: Structural Knowledge Unlearning via Neighborhood-Expanded Distribution Shaping

Pith reviewed 2026-05-15 21:03 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords knowledge unlearninglarge language modelsknowledge graphsgraph obliviondistribution shapingreasoning leakagestructured knowledgelocality preservation
0
0 comments X

The pith

A graph-based method removes specific facts from LLMs while blocking reasoning leakage through connected knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GONE, a benchmark that tests unlearning on knowledge-graph facts rather than isolated sentences. It identifies three separable effects: direct fact removal, leakage through multi-hop reasoning, and broad catastrophic forgetting. NEDS expands each target fact to its graph neighbors, then reshapes the model's output distribution to place a sharp boundary between the fact and its semantic neighborhood. This produces complete removal of the chosen fact together with high retention of related but non-targeted knowledge. A sympathetic reader would care because current unlearning techniques leave relational knowledge intact, allowing models to reconstruct deleted facts through inference.

Core claim

The central claim is that graph connectivity supplies the right set of anchor neighbors; shaping the distribution over that expanded neighborhood draws a precise decision boundary that erases the direct fact, prevents reasoning-based leakage to correlated nodes, and leaves unrelated knowledge untouched.

What carries the argument

Neighborhood-Expanded Distribution Shaping (NEDS), which uses graph connectivity to select correlated neighbors and then enforces a decision boundary separating the forgotten fact from its semantic neighborhood.

If this is right

  • LLMs can be made to forget specific relational facts without erasing the ability to reason over the remaining graph.
  • Unlearning evaluation can now separately measure direct removal, reasoning leakage, and general forgetting on the same benchmark.
  • The same neighborhood-expansion step raises unlearning efficacy to 1.000 and locality to 0.839 on LLaMA-3-8B and Mistral-7B.
  • Graph-aware distribution shaping offers a template for handling other structured data formats beyond simple knowledge graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The boundary-shaping idea could be tested on citation graphs or molecular graphs where facts are also linked by explicit relations.
  • Regulatory requests to delete specific training facts could be implemented by first extracting the relevant subgraph and then running NEDS on it.
  • If the method scales, it would let model owners demonstrate compliance with data-deletion rules without retraining from scratch.

Load-bearing premise

Graph links reliably mark the exact set of neighbors whose distribution shaping will cleanly separate direct fact removal from any reasoning-based reconstruction.

What would settle it

After applying NEDS to a target KG fact, the model still produces correct answers to questions that chain that fact with its immediate neighbors.

read the original abstract

Unlearning knowledge is a pressing and challenging task in Large Language Models (LLMs) because of their unprecedented capability to memorize and digest training data at scale, raising more significant issues regarding safety, privacy, and intellectual property. However, existing works, including parameter editing, fine-tuning, and distillation-based methods, are all focused on flat sentence-level data but overlook the relational, multi-hop, and reasoned knowledge in naturally structured data. In response to this gap, this paper introduces Graph Oblivion and Node Erasure (GONE), a benchmark for evaluating knowledge unlearning over structured knowledge graph (KG) facts in LLMs. This KG-based benchmark enables the disentanglement of three effects of unlearning: direct fact removal, reasoning-based leakage, and catastrophic forgetting. In addition, Neighborhood-Expanded Distribution Shaping (NEDS), a novel unlearning framework, is designed to leverage graph connectivity and identify anchor correlated neighbors, enforcing a precise decision boundary between the forgotten fact and its semantic neighborhood. Evaluations on LLaMA-3-8B and Mistral-7B across multiple knowledge editing and unlearning methods showcase NEDS's superior performance (1.000 on unlearning efficacy and 0.839 on locality) on GONE and other benchmarks. Code is available at https://anonymous.4open.science/r/GONE-4679/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the GONE benchmark for evaluating knowledge unlearning on structured knowledge-graph facts in LLMs, designed to disentangle direct fact removal, reasoning-based leakage, and catastrophic forgetting. It proposes the NEDS framework, which uses graph connectivity to expand neighborhoods and shape output distributions around anchor neighbors, claiming to enforce a precise decision boundary; evaluations on LLaMA-3-8B and Mistral-7B report superior results with 1.000 unlearning efficacy and 0.839 locality versus prior editing and unlearning baselines.

Significance. If the alignment between KG connectivity and LLM-internal semantic correlations holds, the work would meaningfully advance unlearning for relational and multi-hop knowledge, addressing a clear gap in sentence-level methods. The benchmark's explicit disentanglement of leakage effects and the open code release are concrete strengths that would support follow-on research in safety and privacy applications.

major comments (2)
  1. [Abstract] Abstract: the reported 1.000 efficacy and 0.839 locality scores are presented without any description of baseline implementations, exact metric definitions, graph-construction details, or controls for data selection; this absence makes the superiority claim only partially verifiable and is load-bearing for the central experimental claim.
  2. [Abstract] Abstract (NEDS description): the method assumes that structural neighbors identified via KG connectivity are exactly the semantically correlated facts used by the LLM for reasoning-based leakage; no ablation, verification, or analysis is supplied to test this alignment, which directly determines whether distribution shaping fully disentangles the three effects without residual leakage or locality degradation.
minor comments (1)
  1. [Abstract] Abstract: the code link is given as an anonymous repository; replace with a permanent, non-anonymous URL in the camera-ready version.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with point-by-point responses, indicating planned revisions where appropriate to improve clarity and verifiability.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 1.000 efficacy and 0.839 locality scores are presented without any description of baseline implementations, exact metric definitions, graph-construction details, or controls for data selection; this absence makes the superiority claim only partially verifiable and is load-bearing for the central experimental claim.

    Authors: We agree that the abstract's brevity limits immediate verifiability of the central claims. The full manuscript provides these details: baseline implementations are described in Section 4.1 (including comparisons to MEMIT, ROME, and fine-tuning methods), exact metric definitions appear in Section 3.2 (unlearning efficacy as the proportion of target facts successfully forgotten, locality as average retention rate on non-target facts), graph-construction details in Section 3.1 (using standard KG datasets such as FB15k-237 with neighborhood expansion via connectivity), and data selection controls in the experimental setup (balanced sampling with explicit positive/negative controls to mitigate selection bias). To address the referee's concern directly in the abstract, we will revise it to include concise references to the evaluation protocol and direct readers to the main text for full specifications. This change will be incorporated. revision: yes

  2. Referee: [Abstract] Abstract (NEDS description): the method assumes that structural neighbors identified via KG connectivity are exactly the semantically correlated facts used by the LLM for reasoning-based leakage; no ablation, verification, or analysis is supplied to test this alignment, which directly determines whether distribution shaping fully disentangles the three effects without residual leakage or locality degradation.

    Authors: We acknowledge that the manuscript does not include a dedicated ablation or direct verification of the alignment between KG structural neighbors and the LLM's internal semantic correlations for reasoning-based leakage. The current evidence rests on the GONE benchmark results, which demonstrate that NEDS achieves complete disentanglement (1.000 efficacy with no residual leakage on direct facts or reasoning paths) while preserving locality at 0.839, outperforming baselines. However, we agree this assumption would benefit from explicit testing. We will add a new analysis subsection with probing experiments (e.g., measuring overlap between KG neighbors and model-activated facts via activation analysis) to quantify the alignment strength and confirm minimal residual effects. This revision will be made. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces GONE benchmark and NEDS framework as novel contributions for KG-based unlearning in LLMs. No load-bearing steps reduce by the paper's equations or self-citations to definitional inputs; graph connectivity is used as an external structural signal to shape distributions, with performance (efficacy 1.000, locality 0.839) reported from evaluations on independent public models (LLaMA-3-8B, Mistral-7B) and benchmarks. The derivation chain is self-contained against external data rather than tautological to fitted parameters or prior self-work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that knowledge graphs capture the relational structure relevant to unlearning leakage and that distribution shaping on neighbors produces a clean decision boundary; the paper introduces two new constructs (GONE and NEDS) without independent external validation of those constructs.

free parameters (1)
  • Neighborhood expansion parameters
    Parameters controlling which neighbors are treated as anchors and how distribution shaping is applied; specifics not detailed in abstract.
axioms (1)
  • domain assumption Knowledge stored in LLMs can be usefully represented and manipulated through graph connectivity for unlearning purposes
    Invoked in the design of both the GONE benchmark and the NEDS method.
invented entities (2)
  • GONE benchmark no independent evidence
    purpose: Disentangle direct fact removal, reasoning leakage, and catastrophic forgetting on structured KG data
    Newly created test suite introduced by the paper.
  • NEDS framework no independent evidence
    purpose: Enforce precise decision boundary via neighborhood-expanded distribution shaping
    Novel unlearning procedure proposed in the paper.

pith-pipeline@v0.9.0 · 5548 in / 1419 out tokens · 31510 ms · 2026-05-15T21:03:17.463796+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.