From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning

· 2026 · cs.CL · arXiv 2601.04278

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Although machine unlearning is essential for removing private, harmful, or copyrighted content from LLMs, current benchmarks often fail to faithfully represent the true ``forgetting scope'' learned by the model. We formalize two distinct unlearning granularities, domain-level and instance-level, and propose \BiForget, an automated framework for synthesizing high-quality forget sets. Unlike prior work relying on \emph{external} generators, \BiForget exploits the target model per se to elicit data that matches its internal knowledge distribution through seed-guided and adversarial prompting. Our experiments across diverse benchmarks show that it achieves a superior balance of relevance, diversity, and efficiency. Quantitatively, in the Harry Potter domain, it improves relevance by ${\sim}20$ and diversity by ${\sim}$0.05 while \emph{halving} the total data size compared to SOTAs. Ultimately, it facilitates more robust forgetting and better utility preservation, providing a more rigorous foundation for evaluating LLM unlearning.

representative citing papers

Position: The Term "Machine Unlearning" Is Overused in LLMs

cs.CL · 2026-05-08 · accept · novelty 5.0

Machine unlearning should be restricted to dataset-defined deletion achieving retraining equivalence, while other LLM tasks require separate terminology and evaluation baselines.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Position: The Term "Machine Unlearning" Is Overused in LLMs cs.CL · 2026-05-08 · accept · none · ref 15 · internal anchor
Machine unlearning should be restricted to dataset-defined deletion achieving retraining equivalence, while other LLM tasks require separate terminology and evaluation baselines.

From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning

fields

years

verdicts

representative citing papers

citing papers explorer