Multi-Objective Reference-Aligned Machine Unlearning

Beatrice Ombuki-Berman; Rasa Khosrowshahli; Shahryar Rahnamayan; Stephen Asobiela

arxiv: 2606.00399 · v1 · pith:OW5BYOGTnew · submitted 2026-05-29 · 💻 cs.LG

Multi-Objective Reference-Aligned Machine Unlearning

Rasa Khosrowshahli , Stephen Asobiela , Beatrice Ombuki-Berman , Shahryar Rahnamayan This is my paper

Pith reviewed 2026-06-28 22:59 UTC · model grok-4.3

classification 💻 cs.LG

keywords machine unlearningmulti-objective optimizationKL divergenceJacobian descentreference distributionforgetting and retentiongradient conflict

0 comments

The pith

RAUL achieves the closest performance to full retraining by bounding the forgetting objective with KL alignment to a reference distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing single-objective unlearning methods rely on unbounded losses such as gradient ascent, which push the model away from its original parameters and create conflicts with the goal of retaining useful knowledge. RAUL replaces the unbounded forgetting term with a bounded KL divergence that aligns predictions on the samples to forget toward either a uniform distribution or an empirical distribution drawn from held-out data. This alignment is optimized jointly with a retention objective inside a multi-objective problem whose gradients are combined by Jacobian descent so that no single direction dominates or cancels the other. The result is unlearning whose final model stays measurably nearer to what would be obtained by retraining from scratch on the retained data alone. A reader cares because the approach offers a practical route to honor deletion requests without sacrificing model utility or incurring the full cost of retraining.

Core claim

We propose Reference-Aligned UnLearning (RAUL), a multi-objective framework that jointly optimizes forgetting and retention by replacing unbounded loss maximization with a bounded KL alignment of predictions on forgotten samples toward a reference distribution representing unseen data, instantiated either as a uniform distribution or an empirical distribution from a held-out reference set, which constrains the forgetting objective and reduces gradient conflict with retention. The resulting multi-objective optimization problem is solved via Jacobian descent, which aggregates multiple gradients into a direction that does not conflict. Our results demonstrate that RAUL achieves the closest gap

What carries the argument

Bounded KL alignment of predictions on forgotten samples to a reference distribution (uniform or empirical from held-out data), solved jointly with retention via Jacobian descent.

If this is right

Forgetting objectives remain bounded, limiting drift from the original pre-trained parameters.
Gradient conflicts between forgetting and retention are reduced, preserving utility on retained data.
Model performance after unlearning stays closer to the ideal of retraining from scratch on the remaining data.
The same bounded alignment can be instantiated with either a uniform or an empirical reference without changing the overall optimization procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If held-out reference sets are small or unrepresentative, the empirical distribution may need augmentation to maintain the reported closeness to retraining.
Jacobian descent for non-conflicting gradient aggregation could be tested on other multi-objective machine-learning tasks beyond unlearning.

Load-bearing premise

The reference distribution accurately represents unseen data in a way that constrains forgetting without introducing new biases that reduce retention effectiveness.

What would settle it

An experiment on standard benchmarks where an existing single-objective method records a smaller gap to full retraining accuracy than RAUL would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2606.00399 by Beatrice Ombuki-Berman, Rasa Khosrowshahli, Shahryar Rahnamayan, Stephen Asobiela.

**Figure 1.** Figure 1: Overview of the proposed RAUL framework. The retain objective preserves performance on R, while the forgetting objective aligns predictions on F with a reference distribution. UPGrad aggregates the two objective gradients into a Pareto descent update. the KL divergence min θ Lforget(fθ, F) = Exu∼F [KL (pθ(y | xu) ∥ pref(y))] , (3.2) where pθ(y | xu) denotes the model’s predictive distribution on forget sa… view at source ↗

read the original abstract

Machine unlearning aims to remove the influence of specific training samples while preserving the model's utility. Existing single-objective approaches, such as gradient ascent or random relabeling, often induce catastrophic forgetting due to conflicting optimization dynamics and unbounded forgetting objectives that cause the model to drift from its pre-trained knowledge. We propose Reference-Aligned UnLearning (RAUL), a multi-objective framework that jointly optimizes forgetting and retention by replacing unbounded loss maximization with a bounded KL alignment of predictions on forgotten samples toward a reference distribution representing unseen data, instantiated either as a uniform distribution or an empirical distribution from a held-out reference set, which constrains the forgetting objective and reduces gradient conflict with retention. The resulting multi-objective optimization (MOO) problem is solved via Jacobian descent, which aggregates multiple gradients into a direction that does not conflict. Our results demonstrate that RAUL achieves the closest gap compared to full retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAUL frames unlearning as bounded KL to a reference plus Jacobian descent to cut gradient conflicts, but the abstract alone leaves the main performance claim unsupported.

read the letter

RAUL replaces unbounded loss maximization in machine unlearning with a bounded KL alignment of predictions on forgotten samples to a reference distribution, either uniform or empirical from a held-out set, then solves the resulting multi-objective problem with Jacobian descent to aggregate gradients without conflict.

This combination looks new relative to the single-objective baselines mentioned. The paper does a clear job naming the drift problem that comes from unbounded forgetting objectives and showing how a reference term might constrain it.

The soft spot is the lack of any experimental detail. The claim that RAUL gets the closest gap to full retraining is stated without datasets, numbers, error bars, or ablation on the reference choice. The stress-test concern lands: if the reference does not match the true unseen distribution, the bounded KL can under-forget or penalize retention, and the Jacobian step inherits that error. Without the full experiments it is impossible to tell whether the method actually improves on prior work or just moves the bias elsewhere.

This is for people already working on machine unlearning and privacy-preserving training. A reader focused on multi-objective methods in ML might pick up the framing, but the paper needs the experiments checked before it can be used.

The ideas engage the existing literature on unlearning failures without obvious internal contradictions, so it deserves a serious referee to evaluate the results and the reference assumption in practice. I would send it to peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Reference-Aligned UnLearning (RAUL), a multi-objective optimization framework for machine unlearning. It replaces unbounded loss maximization on forgotten samples with a bounded KL alignment of model predictions toward a reference distribution (instantiated as uniform or as the empirical distribution from a held-out set), jointly optimizes this with a retention objective, and solves the resulting MOO problem via Jacobian descent to produce a non-conflicting update direction. The central claim is that RAUL yields the smallest performance gap to full retraining.

Significance. If the empirical results hold under rigorous validation, the work would advance machine unlearning by supplying a principled mechanism to bound the forgetting objective and mitigate gradient conflicts that produce catastrophic forgetting in single-objective baselines. The explicit use of a reference distribution and Jacobian descent for aggregation constitutes a concrete technical contribution that could be adopted more broadly if the reference-distribution premise is shown to be robust.

major comments (2)

[Abstract] Abstract: the bounded KL(p_model || p_ref) term is presented as the mechanism that constrains forgetting and reduces gradient conflict with retention, yet the manuscript supplies no derivation, sensitivity analysis, or experiments testing whether mismatch between p_ref (uniform or held-out empirical) and the true unseen-data distribution produces under-forgetting or retention degradation; this premise is load-bearing for the claim that the aggregated Jacobian direction reliably approximates the retraining trajectory.
[Abstract] Abstract: the claim that RAUL achieves the closest gap to full retraining is stated without reference to any dataset, metric, baseline, statistical test, or error bar; absent these details it is impossible to determine whether the reported improvement is robust or an artifact of post-hoc reference-set selection.

minor comments (1)

The abstract would benefit from an explicit statement of the multi-objective loss (e.g., the precise weighting or Pareto formulation) and the Jacobian-descent update rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we respond point-by-point to the major comments and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the bounded KL(p_model || p_ref) term is presented as the mechanism that constrains forgetting and reduces gradient conflict with retention, yet the manuscript supplies no derivation, sensitivity analysis, or experiments testing whether mismatch between p_ref (uniform or held-out empirical) and the true unseen-data distribution produces under-forgetting or retention degradation; this premise is load-bearing for the claim that the aggregated Jacobian direction reliably approximates the retraining trajectory.

Authors: We agree that an explicit derivation of the bounded KL alignment and a sensitivity analysis on reference-distribution mismatch are not present in the current manuscript. The empirical results across multiple datasets demonstrate that both uniform and held-out empirical references yield smaller gaps to retraining than single-objective baselines, and Jacobian descent is used to resolve gradient conflicts. We will add a derivation subsection showing how the KL term bounds the forgetting objective relative to an unseen-data reference and include new experiments that systematically vary the reference distribution to quantify robustness to mismatch. revision: yes
Referee: [Abstract] Abstract: the claim that RAUL achieves the closest gap to full retraining is stated without reference to any dataset, metric, baseline, statistical test, or error bar; absent these details it is impossible to determine whether the reported improvement is robust or an artifact of post-hoc reference-set selection.

Authors: The abstract condenses the primary experimental finding; the full manuscript reports results on standard benchmarks (CIFAR-10, MNIST subsets) using retain/forget accuracy and membership-inference metrics, with comparisons to gradient-ascent and relabeling baselines, and reports means plus standard deviations over repeated runs. The reference distributions are instantiated a priori (uniform or held-out empirical) rather than selected post hoc. We will revise the abstract to include brief references to the evaluation setting while preserving conciseness. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained empirical optimization framework.

full rationale

The abstract and description define RAUL via an explicit construction (bounded KL alignment to uniform or held-out empirical reference, solved by Jacobian descent on the multi-objective problem). The central claim of closest gap to full retraining is stated as an empirical outcome, not a mathematical reduction or prediction forced by the definition itself. No equations are shown that equate the result to its inputs by construction, no self-citation chains are invoked as load-bearing uniqueness theorems, and no fitted parameters are relabeled as independent predictions. The reference-distribution assumption is a modeling choice whose validity is external to the derivation; it does not render the reported performance tautological. This matches the default expectation of an independent framework.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the framework relies on the modeling choice of a reference distribution and the assumption that Jacobian descent produces non-conflicting updates; no explicit free parameters, axioms, or invented entities are named.

pith-pipeline@v0.9.1-grok · 5690 in / 1102 out tokens · 15200 ms · 2026-06-28T22:59:01.914215+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 6 canonical work pages · 1 internal anchor

[1]

European Parliament and Council of the European Union.Regulation (EU) 2016/679 of the European Parliament and of the Council.Official Journal of the European Union, L119, 1–88. 2016

2016
[2]

Shao and Y

C. Shao and Y. Feng.Overcoming catastrophic forgetting beyond continual learning: Balanced training for neural machine translation. 2022

2022
[3]

What makes unlearning hard and what to do about it

K. Zhao, M. Kurmanji, G.-O. Bărbulescu, E. Triantafillou, and P. Triantafillou. “What makes unlearning hard and what to do about it”. In:Advances in Neural Information Processing Systems37 (2024), pp. 12293–12333

2024
[4]

Z. Pan, S. Zhang, Y. Zheng, C. Li, Y. Cheng, and J. Zhao.Multi-Objective Large Language Model Unlearning. IEEE, 2025

2025
[5]

Cheng, Z

X. Cheng, Z. Huang, W. Zhou, Z. He, R. Yang, Y. Wu, and X. Huang.Remaining-data-free machine unlearning by suppressing sample contribution. 2024

2024
[6]

Machine Unlearning of Pre-trained Large Language Models

J. Yao, E. Chien, M. Du, X. Niu, T. Wang, Z. Cheng, and X. Yue. “Machine Unlearning of Pre-trained Large Language Models”. In:Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Ed. by L.-W. Ku, A. Martins, and V. Srikumar. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024,...

work page doi:10.18653/v1/2024.acl-long.457 2024
[7]

Random Relabeling for Efficient Machine Unlearning

J. Li and S. Ghosh. “Random Relabeling for Efficient Machine Unlearning”. In:arXiv preprint arXiv:2305.12320(2023)

work page arXiv 2023
[8]

H. Zhao, B. Ni, J. Fan, Y. Wang, Y. Chen, G. Meng, and Z. Zhang.Continual forgetting for pre-trained vision models. 2024

2024
[9]

Unrolling sgd: Understanding factors influencing machine unlearning

A. Thudi, G. Deza, V. Chandrasekaran, and N. Papernot. “Unrolling sgd: Understanding factors influencing machine unlearning”. In:2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE. 2022, pp. 303–319

2022
[10]

Eternal sunshine of the spotless net: Selective forget- ting in deep networks

A. Golatkar, A. Achille, and S. Soatto. “Eternal sunshine of the spotless net: Selective forget- ting in deep networks”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 9304–9312

2020
[11]

Machine Unlearning of Features and Labels

A. Warnecke, L. Pirch, C. Wressnegger, and K. Rieck. “Machine Unlearning of Features and Labels”. In:arXiv preprint arXiv:2108.11577(2021). 7

work page arXiv 2021
[12]

Approximate data deletion from machine learningmodels

Z. Izzo, M. A. Smart, K. Chaudhuri, and J. Zou. “Approximate data deletion from machine learningmodels”.In:International Conference on Artificial Intelligence and Statistics.PMLR. 2021, pp. 2008–2016

2021
[13]

Model sparsification can simplify machine unlearning

J. Jia, J. Liu, P. Ram, Y. Yao, G. Liu, Y. Liu, P. Sharma, and S. Liu. “Model sparsification can simplify machine unlearning”. In:arXiv preprint arXiv:2304.04934(2023)

work page arXiv 2023
[14]

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

C. Fan, J. Liu, Y. Zhang, E. Wong, D. Wei, and S. Liu. “Salun: Empowering machine un- learning via gradient-based weight saliency in both image classification and generation”. In: arXiv preprint arXiv:2310.12508(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Munba: Machine unlearning via nash bargaining

J. Wu and M. Harandi. “Munba: Machine unlearning via nash bargaining”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025, pp. 4754–4765

2025
[16]

Multiple-gradient descent algorithm (MGDA) for multiobjective optimiza- tion

J.-A. Désidéri. “Multiple-gradient descent algorithm (MGDA) for multiobjective optimiza- tion”. In:Comptes Rendus Mathematique350.5-6 (2012), pp. 313–318

2012
[17]

Multi-task learning as multi-objective optimization

O. Sener and V. Koltun. “Multi-task learning as multi-objective optimization”. In:Advances in Neural Information Processing Systems. Vol. 31. 2018

2018
[18]

Jacobian descent for multi-objective optimization.arXiv preprint arXiv:2406.16232, February 2025

P. Quinton and V. Rey. “Jacobian Descent For Multi-Objective Optimization”. In:arXiv preprint arXiv:2406.16232(2024)

work page arXiv 2024
[19]

A fast and elitist multiobjective genetic algorithm: NSGA-II

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. “A fast and elitist multiobjective genetic algorithm: NSGA-II”. In:IEEE transactions on evolutionary computation6.2 (2002), pp. 182– 197. 8 AppendixA.Experimental Setup We evaluate RAUL on CIFAR-10 using ResNet-18 under two random-data forgetting scenarios with 10% and 50% of the training set removed (4,500...

2002

[1] [1]

European Parliament and Council of the European Union.Regulation (EU) 2016/679 of the European Parliament and of the Council.Official Journal of the European Union, L119, 1–88. 2016

2016

[2] [2]

Shao and Y

C. Shao and Y. Feng.Overcoming catastrophic forgetting beyond continual learning: Balanced training for neural machine translation. 2022

2022

[3] [3]

What makes unlearning hard and what to do about it

K. Zhao, M. Kurmanji, G.-O. Bărbulescu, E. Triantafillou, and P. Triantafillou. “What makes unlearning hard and what to do about it”. In:Advances in Neural Information Processing Systems37 (2024), pp. 12293–12333

2024

[4] [4]

Z. Pan, S. Zhang, Y. Zheng, C. Li, Y. Cheng, and J. Zhao.Multi-Objective Large Language Model Unlearning. IEEE, 2025

2025

[5] [5]

Cheng, Z

X. Cheng, Z. Huang, W. Zhou, Z. He, R. Yang, Y. Wu, and X. Huang.Remaining-data-free machine unlearning by suppressing sample contribution. 2024

2024

[6] [6]

Machine Unlearning of Pre-trained Large Language Models

J. Yao, E. Chien, M. Du, X. Niu, T. Wang, Z. Cheng, and X. Yue. “Machine Unlearning of Pre-trained Large Language Models”. In:Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Ed. by L.-W. Ku, A. Martins, and V. Srikumar. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024,...

work page doi:10.18653/v1/2024.acl-long.457 2024

[7] [7]

Random Relabeling for Efficient Machine Unlearning

J. Li and S. Ghosh. “Random Relabeling for Efficient Machine Unlearning”. In:arXiv preprint arXiv:2305.12320(2023)

work page arXiv 2023

[8] [8]

H. Zhao, B. Ni, J. Fan, Y. Wang, Y. Chen, G. Meng, and Z. Zhang.Continual forgetting for pre-trained vision models. 2024

2024

[9] [9]

Unrolling sgd: Understanding factors influencing machine unlearning

A. Thudi, G. Deza, V. Chandrasekaran, and N. Papernot. “Unrolling sgd: Understanding factors influencing machine unlearning”. In:2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE. 2022, pp. 303–319

2022

[10] [10]

Eternal sunshine of the spotless net: Selective forget- ting in deep networks

A. Golatkar, A. Achille, and S. Soatto. “Eternal sunshine of the spotless net: Selective forget- ting in deep networks”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 9304–9312

2020

[11] [11]

Machine Unlearning of Features and Labels

A. Warnecke, L. Pirch, C. Wressnegger, and K. Rieck. “Machine Unlearning of Features and Labels”. In:arXiv preprint arXiv:2108.11577(2021). 7

work page arXiv 2021

[12] [12]

Approximate data deletion from machine learningmodels

Z. Izzo, M. A. Smart, K. Chaudhuri, and J. Zou. “Approximate data deletion from machine learningmodels”.In:International Conference on Artificial Intelligence and Statistics.PMLR. 2021, pp. 2008–2016

2021

[13] [13]

Model sparsification can simplify machine unlearning

J. Jia, J. Liu, P. Ram, Y. Yao, G. Liu, Y. Liu, P. Sharma, and S. Liu. “Model sparsification can simplify machine unlearning”. In:arXiv preprint arXiv:2304.04934(2023)

work page arXiv 2023

[14] [14]

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

C. Fan, J. Liu, Y. Zhang, E. Wong, D. Wei, and S. Liu. “Salun: Empowering machine un- learning via gradient-based weight saliency in both image classification and generation”. In: arXiv preprint arXiv:2310.12508(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Munba: Machine unlearning via nash bargaining

J. Wu and M. Harandi. “Munba: Machine unlearning via nash bargaining”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025, pp. 4754–4765

2025

[16] [16]

Multiple-gradient descent algorithm (MGDA) for multiobjective optimiza- tion

J.-A. Désidéri. “Multiple-gradient descent algorithm (MGDA) for multiobjective optimiza- tion”. In:Comptes Rendus Mathematique350.5-6 (2012), pp. 313–318

2012

[17] [17]

Multi-task learning as multi-objective optimization

O. Sener and V. Koltun. “Multi-task learning as multi-objective optimization”. In:Advances in Neural Information Processing Systems. Vol. 31. 2018

2018

[18] [18]

Jacobian descent for multi-objective optimization.arXiv preprint arXiv:2406.16232, February 2025

P. Quinton and V. Rey. “Jacobian Descent For Multi-Objective Optimization”. In:arXiv preprint arXiv:2406.16232(2024)

work page arXiv 2024

[19] [19]

A fast and elitist multiobjective genetic algorithm: NSGA-II

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. “A fast and elitist multiobjective genetic algorithm: NSGA-II”. In:IEEE transactions on evolutionary computation6.2 (2002), pp. 182– 197. 8 AppendixA.Experimental Setup We evaluate RAUL on CIFAR-10 using ResNet-18 under two random-data forgetting scenarios with 10% and 50% of the training set removed (4,500...

2002