pith. sign in

arxiv: 2507.04219 · v5 · pith:BYKVHEGKnew · submitted 2025-07-06 · 💻 cs.LG · cs.AI

Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

classification 💻 cs.LG cs.AI
keywords unlearningmodelcollapsedatainformationapproacheffectivelyllms
0
0 comments X
read the original abstract

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, but also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from model outputs. Our central insight is that model collapse can be leveraged for machine unlearning by deliberately triggering it for data we aim to remove. We theoretically analyze that our approach converges to the desired outcome, i.e. the model unlearns the data targeted for removal. We empirically demonstrate that PMC overcomes four key limitations of existing unlearning methods that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs while preserving general model utility. Overall, our contributions represent an important step toward more comprehensive unlearning that better aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning

    cs.CR 2026-05 unverdicted novelty 6.0

    Adaptive Unlearning suppresses package hallucinations in code-generating LLMs by 81% while preserving benchmark performance, using model-generated data and no human labels.

  2. Position: The Term "Machine Unlearning" Is Overused in LLMs

    cs.CL 2026-05 accept novelty 5.0

    Machine unlearning should be restricted to dataset-defined deletion achieving retraining equivalence, while other LLM tasks require separate terminology and evaluation baselines.