pith. machine review for the scientific record. sign in

arxiv: 2604.13438 · v1 · submitted 2026-04-15 · 💻 cs.LG

Recognition: unknown

WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:25 UTC · model grok-4.3

classification 💻 cs.LG
keywords machine unlearningright to be forgottenNewton methodWoodbury identityGauss-Newton approximationprivacy-preserving MLretain-free unlearningsecond-order optimization
0
0 comments X

The pith

WIN-U removes the influence of specific data from a trained model using a single Newton-style step that needs no access to the remaining training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WIN-U as a retain-free unlearning method that starts from a model trained on all data and applies one Newton-style update to excise the effect of a forget set. The update relies on the Woodbury matrix identity together with a generalized Gauss-Newton approximation to the curvature induced by the forget set. A sympathetic reader cares because privacy regulations increasingly require models to forget particular examples, yet full retraining on the kept data is often impossible when that data is private, expensive to store, or unavailable after initial training.

Core claim

Using the Woodbury matrix identity and a generalized Gauss-Newton approximation for the forget set curvature, the WIN-U update recovers the closed-form linear solution and serves as a local second-order approximation to the gold-standard retraining optimum. The method requires only second-order information computed on the originally trained model and performs the unlearning in a single step without ever accessing the retain set.

What carries the argument

The Woodbury matrix identity applied to a generalized Gauss-Newton approximation of the forget-set Hessian, which converts the Newton update into an efficient closed-form expression that does not require the retain data.

If this is right

  • The approach achieves state-of-the-art unlearning efficacy and utility preservation on vision and language benchmarks.
  • It exhibits greater robustness to relearning attacks than prior retain-free or retain-dependent methods.
  • Unlearning becomes feasible in settings where the retain set cannot be stored or re-accessed after initial training.
  • The framework extends naturally to both linear and nonlinear models while preserving second-order local accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In streaming or federated scenarios where data owners withdraw consent after deployment, a single-step method could allow models to be updated without recollecting the entire retained corpus.
  • The same Woodbury-plus-Gauss-Newton construction might be reused for other selective updates, such as removing the effect of a batch of recent examples or correcting for distribution shift in a subset of the data.
  • If the approximation quality holds under non-convex losses, the technique could reduce the computational barrier to repeated unlearning requests in large language models.

Load-bearing premise

The generalized Gauss-Newton approximation to the curvature of the loss on the forget set is accurate enough that a single Newton-style step produces a model close to the one obtained by retraining from scratch on the retain set alone.

What would settle it

Retrain a model from scratch on the retain set after removing the forget set, apply WIN-U to the original model, and compare the two resulting models on parameter distance or performance on a held-out validation set drawn from the retain distribution; large discrepancies would show the single-step approximation fails to match the gold standard.

Figures

Figures reproduced from arXiv: 2604.13438 by Malik Magdon-Ismail, Mohammad Mohammadi Amiri, Xingjian Zhao.

Figure 1
Figure 1. Figure 1: WIN-U trade-off curves before and after benign relearning with [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative example from the TOFU forget set illustrating relearning robustness. [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: WIN-U trade-off curves across varying MC sample sizes [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
read the original abstract

Privacy concerns in LLMs have led to the rapidly growing need to enforce a data's "right to be forgotten". Machine unlearning addresses precisely this task, namely the removal of the influence of some specific data, i.e., the forget set, from a trained model. The gold standard for unlearning is to produce the model that would have been learned on only the rest of the training data, i.e., the retain set. Most existing unlearning methods rely on direct access to the retained data, which may not be practical due to privacy or cost constraints. We propose WIN-U, a retained-data free unlearning framework that requires only second order information for the originally trained model on the full data. The unlearning is performed using a single Newton-style step. Using the Woodbury matrix identity and a generalized Gauss-Newton approximation for the forget set curvature, the WIN-U update recovers the closed-form linear solution and serves as a local second-order approximation to the gold-standard retraining optimum. Extensive experiments on various vision and language benchmarks demonstrate that WIN-U achieves SOTA performance in terms of unlearning efficacy and utility preservation, while being more robust against relearning attacks compared to existing methods. Importantly, WIN-U does not require access to the retained data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes WIN-U, a retain-free machine unlearning method that performs a single Newton-style update on a trained model using only its second-order information. It applies the Woodbury matrix identity together with a generalized Gauss-Newton approximation to the forget-set curvature, claiming that the resulting update exactly recovers the closed-form linear solution and provides a local second-order approximation to the gold-standard model obtained by retraining from scratch on the retain set alone. Experiments on vision and language benchmarks report state-of-the-art unlearning efficacy, utility preservation, and improved robustness to relearning attacks.

Significance. If the curvature approximation remains sufficiently accurate, the framework would be a meaningful advance for privacy-preserving unlearning in settings where retain data cannot be retained or accessed. The retain-free property, the exact recovery for linear models, and the empirical SOTA results on multiple benchmarks are clear strengths. The absence of error bounds or direct verification against retraining, however, leaves the practical significance for nonlinear deep models uncertain.

major comments (2)
  1. [Abstract / derivation of WIN-U update] Abstract and the derivation of the WIN-U update: the claim that the Woodbury-adjusted Newton step with generalized Gauss-Newton curvature yields a local second-order approximation to retraining is stated without an explicit error analysis or bound on the omitted second-derivative terms of the loss. For non-convex deep-network objectives these terms are not guaranteed to be negligible after removal of the forget set, and a single step performed without any retain data can propagate the resulting bias directly into the final parameters.
  2. [Experiments] Experimental evaluation: while SOTA unlearning and utility numbers are reported, the manuscript contains no quantitative comparison (e.g., parameter or output distance, or downstream-task gap) between the WIN-U model and the gold-standard retrained model on the retain set for the nonlinear architectures used. Without such verification the empirical results do not directly substantiate the central approximation claim.
minor comments (2)
  1. [Method] The notation distinguishing the original Hessian, the forget-set Gauss-Newton matrix, and the Woodbury correction could be made more explicit with a single summary equation.
  2. [Figures] Figure captions and axis labels should explicitly state whether reported metrics are computed on the forget set, retain set, or both.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review and constructive feedback on our manuscript. We appreciate the recognition of the retain-free property, exact recovery for linear models, and empirical results as strengths. We address the two major comments point by point below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract / derivation of WIN-U update] Abstract and the derivation of the WIN-U update: the claim that the Woodbury-adjusted Newton step with generalized Gauss-Newton curvature yields a local second-order approximation to retraining is stated without an explicit error analysis or bound on the omitted second-derivative terms of the loss. For non-convex deep-network objectives these terms are not guaranteed to be negligible after removal of the forget set, and a single step performed without any retain data can propagate the resulting bias directly into the final parameters.

    Authors: We agree that an explicit error analysis would strengthen the theoretical contribution. The derivation establishes that the WIN-U update exactly recovers the closed-form solution for linear models via the Woodbury identity applied to the GGN approximation of the forget-set curvature. For nonlinear models, this constitutes a local second-order approximation around the original parameters, as the Newton step targets the minimizer of an approximated retain loss. However, we acknowledge the absence of bounds on the higher-order terms neglected by the GGN approximation and the potential for bias propagation in a single step without retain data. In the revised version, we will expand the discussion in the derivation section to clarify the nature of the approximation, highlight the conditions (e.g., small forget-set size or near-convexity) under which it is expected to be accurate, and explicitly note the lack of rigorous error bounds as a limitation of the current analysis. revision: yes

  2. Referee: [Experiments] Experimental evaluation: while SOTA unlearning and utility numbers are reported, the manuscript contains no quantitative comparison (e.g., parameter or output distance, or downstream-task gap) between the WIN-U model and the gold-standard retrained model on the retain set for the nonlinear architectures used. Without such verification the empirical results do not directly substantiate the central approximation claim.

    Authors: We concur that direct verification against the gold-standard retrained model would provide valuable evidence supporting the approximation claim for nonlinear models. Our current experiments focus on comparing WIN-U against existing unlearning baselines in terms of unlearning efficacy (measured by attack success rates) and utility preservation (accuracy on retain data), where it achieves state-of-the-art results. We did not include direct distance metrics to the retrained model, as the primary goal was to demonstrate practical performance without access to retain data during unlearning. Nevertheless, for the smaller-scale benchmarks in the paper, computing the retrained model is feasible. In the revision, we will add quantitative comparisons, such as L2 parameter distance and output KL divergence or accuracy gap on downstream tasks, between WIN-U and the retrained model for at least the vision benchmarks (e.g., CIFAR-10) and one language task to better substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation applies external matrix identities and standard approximations to original model information

full rationale

The paper's core derivation applies the Woodbury matrix identity (a standard algebraic result independent of the paper) and a generalized Gauss-Newton approximation (a conventional second-order technique in optimization literature) to the Hessian and gradient information of the originally trained model. It recovers the exact closed-form solution for the linear case as a direct algebraic consequence under the stated assumptions, and positions the general case as a local approximation to retraining. No steps reduce to self-definition, fitted parameters renamed as predictions, or load-bearing self-citations; the central claim rests on the external validity of the identities and the empirical accuracy of the curvature approximation rather than tautological equivalence to inputs. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the accuracy of the generalized Gauss-Newton approximation for the forget-set curvature and on the assumption that a single Newton step suffices to reach a local approximation of the retrained optimum; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption A generalized Gauss-Newton approximation accurately captures the curvature of the loss on the forget set.
    Invoked to enable the Newton-style update without the full Hessian.

pith-pipeline@v0.9.0 · 5531 in / 1414 out tokens · 35402 ms · 2026-05-10T14:25:41.310455+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 17 canonical work pages · 5 internal anchors

  1. [1]

    A certified unlearning approach without access to source data.arXiv preprint arXiv:2506.06486,

    Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, and Basak Guler. A certified unlearning approach without access to source data.arXiv preprint arXiv:2506.06486,

  2. [2]

    Deeb and F

    Aghyad Deeb and Fabien Roger. Do unlearning methods remove information from language model weights?arXiv preprint arXiv:2410.08827,

  3. [4]

    arXiv preprint arXiv:2506.12618 , year=

    URL https://arxiv.org/abs/2506.12618. European Parliament and Council of the European Union. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protec...

  4. [6]

    Chongyang Gao, Lixu Wang, Kaize Ding, Chenkai Weng, Xiao Wang, and Qi Zhu

    URLhttps://arxiv.org/abs/2410.07163. Chongyang Gao, Lixu Wang, Kaize Ding, Chenkai Weng, Xiao Wang, and Qi Zhu. On large language model continual unlearning.arXiv preprint arXiv:2407.10223,

  5. [7]

    A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854,

    Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, and Fakhri Karray. A comprehensive survey of machine unlearning techniques for large language models.arXiv preprint arXiv:2503.01854,

  6. [8]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

  7. [9]

    arXiv preprint arXiv:1911.03030 (2019)

    10 Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten. Certified data removal from machine learning models.arXiv preprint arXiv:1911.03030,

  8. [10]

    Editing Models with Task Arithmetic

    URL https://api.semanticscholar.org/CorpusID: 270619566. Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. arXiv preprint arXiv:2212.04089,

  9. [11]

    The wmdp benchmark: Measuring and reduc- ing malicious use with unlearning.arXiv preprint arXiv:2403.03218, 2024

    Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, et al. The wmdp benchmark: Measuring and reducing malicious use with unlearning.arXiv preprint arXiv:2403.03218,

  10. [13]

    URLhttps://openreview.net/forum?id=J5IRyTKZ9s

    URLhttps://arxiv.org/abs/2402.16835. Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary Chase Lipton, and J Zico Kolter. TOFU: A task of fictitious unlearning for LLMs. InFirst Conference on Language Modeling,

  11. [14]

    Gauss-newton unlearning for the llm era.arXiv preprint arXiv:2602.10568,

    Lev McKinney, Anvith Thudi, Juhan Bae, Tara Rezaei, Nicolas Papernot, Sheila A McIl- raith, and Roger Grosse. Gauss-newton unlearning for the llm era.arXiv preprint arXiv:2602.10568,

  12. [15]

    Hessian-free online certified unlearning.arXiv preprint arXiv:2404.01712,

    Xinbao Qiao, Meng Zhang, Ming Tang, and Ermin Wei. Hessian-free online certified unlearning.arXiv preprint arXiv:2404.01712,

  13. [17]

    arXiv preprint arXiv:2407.06460 , year=

    URLhttps://arxiv.org/abs/2407.06460. Qwen Team. Qwen2.5: A party of foundation models, September

  14. [18]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    URL https: //qwenlm.github.io/blog/qwen2.5/. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288,

  15. [19]

    Machine Unlearning: A Comprehensive Survey

    Weiqi Wang, Zhiyi Tian, Chenhan Zhang, and Shui Yu. Machine unlearning: A comprehen- sive survey.arXiv preprint arXiv:2405.07406,

  16. [20]

    Erase or hide? suppressing spurious unlearning neurons for robust unlearn- ing.ArXiv, abs/2509.22263,

    Nakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsu Kim, Kyomin Jung, and Meeyoung Cha. Erase or hide? suppressing spurious unlearning neurons for robust unlearn- ing.ArXiv, abs/2509.22263,

  17. [21]

    Towards Certified Unlearning for Deep Neural Networks

    URL https://api.semanticscholar.org/CorpusID: 281659413. Binchi Zhang, Yushun Dong, Tianhao Wang, and Jundong Li. Towards certified unlearning for deep neural networks.arXiv preprint arXiv:2408.00920, 2024a. Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning.arXiv preprint ar...

  18. [22]

    A Detailed derivations A.1 Linear Woodbury derivation Starting from the retraining system (9), we have θ∗ r = (H−H f )−1( 1 n X⊤y− 1 n X⊤ f y f ), where H f = 1 n X⊤ f X f . Applying the Woodbury matrix identity (A+UCV) −1 =A −1 − A−1U(C −1 +VA −1U)−1VA−1 withA=H,U=X ⊤ f ,C=− 1 n I,V=X f : H− 1 n X⊤ f X f −1 =H −1 + 1 n H−1X⊤ f Im − 1 n X f H−1X⊤ f −1 X f...

  19. [23]

    Output Divergence

    Common setup.All small-scale experiments use ℓ2 regularization with λ= 0.01. All meth- ods (Vanilla Newton and WIN-U) are applied as a single Newton step from the converged 15 original model. The gold-standard retrain baseline retrains from scratch on the retain set only, using the scaled regularization λr = n n−m λ. Table 1 reports forget/retain/test per...

  20. [24]

    While specific Ji-Yeon Park’s cultural background suggests her leadership role, it is not enhanced by any apparent contradictions

    fine-tuned on the full TOFU dataset (open-unlearning/tofu Llama-3.2-1B-Instruct full). The TOFU benchmark consists of 4,000 fictitious author profiles; we evaluate on the forget1 (m= 40 forget samples), forget5 (m= 200 forget samples), and forget10 (m= 400 forget samples) splits. Metrics. • Forget QA Prob: average next-token probability on forget-set ques...

  21. [25]

    on the WMDP bench- mark (Li et al., 2024), which evaluates hazardous-knowledge unlearning. The forget set consists of 1,000 cybersecurity documents (7.6M tokens) from the WMDP cyber-forget cor- pus, and the retain set consists of 4,473 documents (21.2M tokens) from the cyber-retain corpus. Metrics. • WMDP-Bio: accuracy on biosecurity multiple-choice quest...