GONE: Structural Knowledge Unlearning via Neighborhood-Expanded Distribution Shaping

Ashutosh Balasubramaniam; Chahana Dahal; Zuobin Xiong

arxiv: 2603.12275 · v1 · submitted 2026-02-21 · 💻 cs.CL · cs.LG

GONE: Structural Knowledge Unlearning via Neighborhood-Expanded Distribution Shaping

Chahana Dahal , Ashutosh Balasubramaniam , Zuobin Xiong This is my paper

Pith reviewed 2026-05-15 21:03 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords knowledge unlearninglarge language modelsknowledge graphsgraph obliviondistribution shapingreasoning leakagestructured knowledgelocality preservation

0 comments

The pith

A graph-based method removes specific facts from LLMs while blocking reasoning leakage through connected knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GONE, a benchmark that tests unlearning on knowledge-graph facts rather than isolated sentences. It identifies three separable effects: direct fact removal, leakage through multi-hop reasoning, and broad catastrophic forgetting. NEDS expands each target fact to its graph neighbors, then reshapes the model's output distribution to place a sharp boundary between the fact and its semantic neighborhood. This produces complete removal of the chosen fact together with high retention of related but non-targeted knowledge. A sympathetic reader would care because current unlearning techniques leave relational knowledge intact, allowing models to reconstruct deleted facts through inference.

Core claim

The central claim is that graph connectivity supplies the right set of anchor neighbors; shaping the distribution over that expanded neighborhood draws a precise decision boundary that erases the direct fact, prevents reasoning-based leakage to correlated nodes, and leaves unrelated knowledge untouched.

What carries the argument

Neighborhood-Expanded Distribution Shaping (NEDS), which uses graph connectivity to select correlated neighbors and then enforces a decision boundary separating the forgotten fact from its semantic neighborhood.

If this is right

LLMs can be made to forget specific relational facts without erasing the ability to reason over the remaining graph.
Unlearning evaluation can now separately measure direct removal, reasoning leakage, and general forgetting on the same benchmark.
The same neighborhood-expansion step raises unlearning efficacy to 1.000 and locality to 0.839 on LLaMA-3-8B and Mistral-7B.
Graph-aware distribution shaping offers a template for handling other structured data formats beyond simple knowledge graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The boundary-shaping idea could be tested on citation graphs or molecular graphs where facts are also linked by explicit relations.
Regulatory requests to delete specific training facts could be implemented by first extracting the relevant subgraph and then running NEDS on it.
If the method scales, it would let model owners demonstrate compliance with data-deletion rules without retraining from scratch.

Load-bearing premise

Graph links reliably mark the exact set of neighbors whose distribution shaping will cleanly separate direct fact removal from any reasoning-based reconstruction.

What would settle it

After applying NEDS to a target KG fact, the model still produces correct answers to questions that chain that fact with its immediate neighbors.

read the original abstract

Unlearning knowledge is a pressing and challenging task in Large Language Models (LLMs) because of their unprecedented capability to memorize and digest training data at scale, raising more significant issues regarding safety, privacy, and intellectual property. However, existing works, including parameter editing, fine-tuning, and distillation-based methods, are all focused on flat sentence-level data but overlook the relational, multi-hop, and reasoned knowledge in naturally structured data. In response to this gap, this paper introduces Graph Oblivion and Node Erasure (GONE), a benchmark for evaluating knowledge unlearning over structured knowledge graph (KG) facts in LLMs. This KG-based benchmark enables the disentanglement of three effects of unlearning: direct fact removal, reasoning-based leakage, and catastrophic forgetting. In addition, Neighborhood-Expanded Distribution Shaping (NEDS), a novel unlearning framework, is designed to leverage graph connectivity and identify anchor correlated neighbors, enforcing a precise decision boundary between the forgotten fact and its semantic neighborhood. Evaluations on LLaMA-3-8B and Mistral-7B across multiple knowledge editing and unlearning methods showcase NEDS's superior performance (1.000 on unlearning efficacy and 0.839 on locality) on GONE and other benchmarks. Code is available at https://anonymous.4open.science/r/GONE-4679/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GONE gives a KG-focused benchmark for unlearning and NEDS tries neighborhood expansion to control leakage, but the core claim rests on untested alignment between graph links and model reasoning.

read the letter

The main point is that this paper supplies a benchmark called GONE for testing unlearning on knowledge-graph facts instead of isolated sentences, along with NEDS, which expands to graph neighbors and shapes output distributions to separate direct fact removal from reasoning leakage. That addresses a clear gap, since most prior unlearning work stays at the sentence level and does not track multi-hop effects. The benchmark's attempt to measure three distinct outcomes—direct removal, leakage through reasoning, and broader forgetting—is a practical step forward, and the reported numbers on LLaMA-3-8B and Mistral-7B show NEDS reaching 1.000 efficacy with 0.839 locality, which is stronger than the baselines they compare against. Code release is also helpful for checking the implementation. The soft spot is the assumption that structural neighbors in the supplied KG line up exactly with the semantic correlations the model actually uses. If the graph misses some internalized links or includes spurious ones, distribution shaping will not fully block leakage or will degrade performance on unrelated facts. The abstract gives strong headline scores but leaves out details on baseline re-implementations, exact metric definitions, and graph construction, so the superiority claim is only partly verifiable from what is shown. This is useful for researchers working on LLM safety, privacy, or knowledge editing who already deal with structured data. A reader who needs concrete benchmarks or new unlearning techniques will get value from the framework even if the numbers require further checks. It is solid enough to deserve a serious referee, mainly because the problem is real and the proposed separation of effects is worth testing properly.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the GONE benchmark for evaluating knowledge unlearning on structured knowledge-graph facts in LLMs, designed to disentangle direct fact removal, reasoning-based leakage, and catastrophic forgetting. It proposes the NEDS framework, which uses graph connectivity to expand neighborhoods and shape output distributions around anchor neighbors, claiming to enforce a precise decision boundary; evaluations on LLaMA-3-8B and Mistral-7B report superior results with 1.000 unlearning efficacy and 0.839 locality versus prior editing and unlearning baselines.

Significance. If the alignment between KG connectivity and LLM-internal semantic correlations holds, the work would meaningfully advance unlearning for relational and multi-hop knowledge, addressing a clear gap in sentence-level methods. The benchmark's explicit disentanglement of leakage effects and the open code release are concrete strengths that would support follow-on research in safety and privacy applications.

major comments (2)

[Abstract] Abstract: the reported 1.000 efficacy and 0.839 locality scores are presented without any description of baseline implementations, exact metric definitions, graph-construction details, or controls for data selection; this absence makes the superiority claim only partially verifiable and is load-bearing for the central experimental claim.
[Abstract] Abstract (NEDS description): the method assumes that structural neighbors identified via KG connectivity are exactly the semantically correlated facts used by the LLM for reasoning-based leakage; no ablation, verification, or analysis is supplied to test this alignment, which directly determines whether distribution shaping fully disentangles the three effects without residual leakage or locality degradation.

minor comments (1)

[Abstract] Abstract: the code link is given as an anonymous repository; replace with a permanent, non-anonymous URL in the camera-ready version.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with point-by-point responses, indicating planned revisions where appropriate to improve clarity and verifiability.

read point-by-point responses

Referee: [Abstract] Abstract: the reported 1.000 efficacy and 0.839 locality scores are presented without any description of baseline implementations, exact metric definitions, graph-construction details, or controls for data selection; this absence makes the superiority claim only partially verifiable and is load-bearing for the central experimental claim.

Authors: We agree that the abstract's brevity limits immediate verifiability of the central claims. The full manuscript provides these details: baseline implementations are described in Section 4.1 (including comparisons to MEMIT, ROME, and fine-tuning methods), exact metric definitions appear in Section 3.2 (unlearning efficacy as the proportion of target facts successfully forgotten, locality as average retention rate on non-target facts), graph-construction details in Section 3.1 (using standard KG datasets such as FB15k-237 with neighborhood expansion via connectivity), and data selection controls in the experimental setup (balanced sampling with explicit positive/negative controls to mitigate selection bias). To address the referee's concern directly in the abstract, we will revise it to include concise references to the evaluation protocol and direct readers to the main text for full specifications. This change will be incorporated. revision: yes
Referee: [Abstract] Abstract (NEDS description): the method assumes that structural neighbors identified via KG connectivity are exactly the semantically correlated facts used by the LLM for reasoning-based leakage; no ablation, verification, or analysis is supplied to test this alignment, which directly determines whether distribution shaping fully disentangles the three effects without residual leakage or locality degradation.

Authors: We acknowledge that the manuscript does not include a dedicated ablation or direct verification of the alignment between KG structural neighbors and the LLM's internal semantic correlations for reasoning-based leakage. The current evidence rests on the GONE benchmark results, which demonstrate that NEDS achieves complete disentanglement (1.000 efficacy with no residual leakage on direct facts or reasoning paths) while preserving locality at 0.839, outperforming baselines. However, we agree this assumption would benefit from explicit testing. We will add a new analysis subsection with probing experiments (e.g., measuring overlap between KG neighbors and model-activated facts via activation analysis) to quantify the alignment strength and confirm minimal residual effects. This revision will be made. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces GONE benchmark and NEDS framework as novel contributions for KG-based unlearning in LLMs. No load-bearing steps reduce by the paper's equations or self-citations to definitional inputs; graph connectivity is used as an external structural signal to shape distributions, with performance (efficacy 1.000, locality 0.839) reported from evaluations on independent public models (LLaMA-3-8B, Mistral-7B) and benchmarks. The derivation chain is self-contained against external data rather than tautological to fitted parameters or prior self-work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that knowledge graphs capture the relational structure relevant to unlearning leakage and that distribution shaping on neighbors produces a clean decision boundary; the paper introduces two new constructs (GONE and NEDS) without independent external validation of those constructs.

free parameters (1)

Neighborhood expansion parameters
Parameters controlling which neighbors are treated as anchors and how distribution shaping is applied; specifics not detailed in abstract.

axioms (1)

domain assumption Knowledge stored in LLMs can be usefully represented and manipulated through graph connectivity for unlearning purposes
Invoked in the design of both the GONE benchmark and the NEDS method.

invented entities (2)

GONE benchmark no independent evidence
purpose: Disentangle direct fact removal, reasoning leakage, and catastrophic forgetting on structured KG data
Newly created test suite introduced by the paper.
NEDS framework no independent evidence
purpose: Enforce precise decision boundary via neighborhood-expanded distribution shaping
Novel unlearning procedure proposed in the paper.

pith-pipeline@v0.9.0 · 5548 in / 1419 out tokens · 31510 ms · 2026-05-15T21:03:17.463796+00:00 · methodology

GONE: Structural Knowledge Unlearning via Neighborhood-Expanded Distribution Shaping

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)