LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

Marius Mosbach; Matteo Boglioni; Siva Reddy; Thibault Rousset; Verna Dankers

arxiv: 2607.02513 · v1 · pith:E7UT3MXQnew · submitted 2026-07-02 · 💻 cs.CL · cs.AI· cs.LG

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

Matteo Boglioni , Thibault Rousset , Siva Reddy , Marius Mosbach , Verna Dankers This is my paper

Pith reviewed 2026-07-03 14:09 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords LLM unlearninglocalization precisionPII removaltestbedresurfacing attacksparameter evaluationOLMo modelscontinual pretraining

0 comments

The pith

A new testbed shows that current LLM unlearning methods rarely target the exact parameters storing sensitive data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates LACUNA to inject personally identifiable information into known parameters of 1B and 7B OLMo-based models using masked continual pretraining. This setup gives ground-truth locations so researchers can check whether unlearning actually changes those weights or only alters outputs. Tests of existing methods reveal they perform well on output metrics yet leave the original parameters largely untouched and remain open to attacks that restore the data. When methods do hit the right parameters, even a basic gradient-based approach produces lasting erasure that resists resurfacing.

Core claim

LACUNA supplies the first benchmark with explicit parameter-level ground truth for unlearning. By embedding PII into predefined weights, it measures localization accuracy directly. Current state-of-the-art localize-first methods prove imprecise and vulnerable to resurfacing attacks. In contrast, successful localization allows a simple gradient-based unlearning procedure to achieve strong erasure and robustness.

What carries the argument

LACUNA testbed, which injects PII into predefined model parameters via masked continual pretraining to create measurable ground truth for localization precision.

If this is right

Output-only benchmarks are insufficient; parameter-level verification is required to confirm erasure.
Imprecise localization leaves models open to simple attacks that recover the original information.
Once localization succeeds, basic gradient updates can produce robust unlearning.
Future method development should prioritize accurate identification of storage locations over output masking alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The testbed could be adapted to measure localization for other memorized content such as facts or code.
If real training stores knowledge more diffusely than the controlled injection, current methods may perform even worse than reported.
Organizations could run localization checks on deployed models before claiming compliance with data-removal requests.
The gap between output success and parameter change suggests unlearning research should treat localization as a separate sub-problem.

Load-bearing premise

The assumption that PII injected through masked continual pretraining into chosen parameters behaves like knowledge acquired during ordinary training.

What would settle it

Direct inspection of model weights after unlearning that shows current methods consistently alter the injected parameters and leave unrelated weights unchanged.

Figures

Figures reproduced from arXiv: 2607.02513 by Marius Mosbach, Matteo Boglioni, Siva Reddy, Thibault Rousset, Verna Dankers.

**Figure 1.** Figure 1: Overview of our pipeline. Phase 1: PII data is mixed with pretraining data and injected into a model via masked continual pretraining, followed by instruction tuning. Phase 2: Unlearning methods attempt to remove memorized PII while preserving retained knowledge. Phase 3: Localization precision is evaluated using the ground-truth mask, while behavioral evaluation measures output-level unlearning success an… view at source ↗

**Figure 2.** Figure 2: PII data management. We inject 1, 200 PII profiles, divided in 6 groups. 150 profiles [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparative analysis of model performance and memorization across training [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: OLMo2 1B - Unlearning Evaluation for email address. In 4a we plot forget, retain, and utility for each method. The line represents utility prior to unlearning. 4b reports the evaluation of our proposed localization precision metric for all unlearning methods. Output-level metrics Following established practices (Dorna et al., 2025), we evaluate unlearning success using three metrics (formalized in Appendix… view at source ↗

**Figure 5.** Figure 5: Relearning vulnerability for email address. Unlearning performance Figure 4a shows the performance of all approaches on the forget, retain, and utility evaluations. The performance on the forget set consistently shows that AlphaEdit is the least performant,9 followed by MemFlex. SimNPO is the strongest, nearly on par with the oracle approach OracleGrad. The retain performance echoes this. In terms of utili… view at source ↗

**Figure 6.** Figure 6: Memorization Comparison between Masked Training and Unmasked Training models, the latter clearly achieves higher extraction rates as it’s less constrained. Choice of mask coverage. The 5% coverage used throughout LACUNA reflects a tradeoff: a smaller mask yields a more precise localization target, but the masked weights must still be able to memorize the injected PII. We found this lower bound to be fairly… view at source ↗

**Figure 7.** Figure 7: Unlearning Evaluation for OLMo2 1B, for three PII fields (see the main paper for email address). On the left-hand side, we visualize Forget, Retain, and Utility metrics. The on Utility represents the Pre-Unlearning results. On the right, we report the evaluation of our proposed Localization Precision metric over AlphaEdit, MemFlex, SimNPO, and our Oracle-baseline OracleGrad. 21 [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 9.** Figure 9: Unlearning evaluation for OLMo3 7B, for all four PII fields. On the left-hand side, we plot the Forget, Retain, and Utility metrics. The on Utility represents the Pre-Unlearning results. On the right, we report the evaluation of our proposed Localization Precision metric over AlphaEdit, MemFlex, SimNPO, and our Oracle-baseline OracleGrad. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 13.** Figure 13: Relearning vulnerability for Email Address. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

**Figure 10.** Figure 10: Relearning vulnerability for Driver’s License. The fact that SimNPO and OracleGrad leak the same profile’s information (Jaccard index of 1.00) suggests that these profiles might be simply particularly challenging to forget. Alpha Edit Mem Flex Sim NPO Oracle Grad 0% 20% 40% 60% 80% 100% Success@200 28% 15% 3% 2% 5% 4% 1% 0% 1B 7B (a) Relearning leakage rate. Alpha Edit Mem Flex Sim NPO Oracle Grad Alpha E… view at source ↗

**Figure 11.** Figure 11: Relearning vulnerability for Birth City. G Hyperparameters - Tuning G.1 Training & Instruction Tuning Along with the discussion in Section 3, we report here some additional details on the specific injection setup and instruction tuning. G.2 Unlearning Unlearning hyperparameters were selected via grid search on a validation split (Driver’s License forget / Email Address retain, cross-field), carried out in… view at source ↗

**Figure 12.** Figure 12: Relearning vulnerability for Phone Number [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

read the original abstract

LLMs memorize sensitive training data, including personally identifiable information (PII), creating a pressing need for reliable post hoc removal methods. Unlearning has emerged as a promising solution, with state-of-the-art(SOTA) methods often following a localize-first, unlearn-second paradigm that targets specific model parameters. However, existing benchmarks evaluate unlearning solely at the output level, leaving open the question of whether unlearning truly erases knowledge from a model's parameters or merely obfuscates it, a concern reinforced by the success of resurfacing attacks. To bridge this gap, we introduce LACUNA: the first unlearning testbed with ground-truth parameter-level localization. LACUNA injects PII of synthetic individuals into predefined parameters of 1B and 7B OLMo-based models via masked continual pretraining, enabling direct evaluation of whether unlearning targets the weights responsible for knowledge storage. We use LACUNA to benchmark current SOTA unlearning methods and find that, despite strong output-level performance, existing methods are highly imprecise and susceptible to resurfacing attacks. We further show that when localization is successful, even a simple gradient-based unlearning method achieves strong erasure and robustness to resurfacing attacks, highlighting the importance of precise unlearning. We release LACUNA to complement behavioral evaluations and drive further advances in robust, localization-based unlearning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LACUNA gives a parameter-level testbed for unlearning localization but the injection step may not deliver clean ground truth.

read the letter

The main point is that this paper builds LACUNA by injecting synthetic PII into specific parameters of 1B and 7B OLMo models through masked continual pretraining, then uses that to check whether unlearning methods actually hit the right weights.

It does something useful by moving past output-only benchmarks. The experiments show that several SOTA methods look strong on behavioral metrics yet remain imprecise at the parameter level and fail resurfacing attacks. They also report that once localization succeeds, a basic gradient-based approach produces stronger and more robust erasure. That contrast is worth having in the literature.

The soft spot is the injection itself. The whole evaluation treats the predefined parameters as the sole storage location. If the masked pretraining lets the PII knowledge spread to other weights, the ground truth is off and the claims about imprecision lose their force. The stress-test note flags exactly this, and nothing in the abstract or setup description shows direct checks for diffusion or isolation. That assumption carries a lot of the results.

Minor issues include the need for clearer reporting on injection success rates and controls, but those are fixable.

This is for people working on localization-based unlearning. It should go to peer review because the testbed idea is new and the problem it targets is concrete, even if the current grounding needs more scrutiny.

Referee Report

2 major / 2 minor

Summary. The paper introduces LACUNA, a testbed for LLM unlearning that injects synthetic PII into predefined parameters of 1B/7B OLMo-based models via masked continual pretraining to create ground-truth localization. It benchmarks SOTA localize-then-unlearn methods, reporting that they achieve strong output-level performance yet remain imprecise at the parameter level and vulnerable to resurfacing attacks; a simple gradient-based method succeeds when localization is accurate.

Significance. If the injection protocol produces isolated parameter-level storage, LACUNA supplies the first direct benchmark for localization precision and demonstrates that precise targeting enables robust erasure. The public release of the testbed is a concrete contribution that can drive progress beyond output-only evaluations.

major comments (2)

[§3] §3 (LACUNA construction): the claim that masked continual pretraining confines injected PII knowledge exclusively to the predefined parameter subset is load-bearing for all downstream precision and resurfacing results. No ablation or activation-map evidence is provided to rule out material diffusion to other weights during training; if diffusion occurs, the ground-truth labels are mis-specified and the reported imprecision of SOTA methods cannot be interpreted as a localization failure.
[§4.2] §4.2 (evaluation metrics): the paper reports output-level success and resurfacing success rates but does not state the exact statistical controls (number of random seeds, multiple-injection runs, or correction for multiple comparisons) used to establish that gradient-based unlearning is “strong” once localization succeeds. Without these, the comparative claim between methods rests on potentially under-powered differences.

minor comments (2)

[Table 1] Table 1: column headers for “localization precision” and “resurfacing robustness” should explicitly reference the equations or thresholds used to compute them.
[Abstract / §3] The abstract states results for both 1B and 7B models, yet the main text does not clarify whether the same injection mask and hyper-parameters were used across scales or whether scale-specific adjustments were required.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions where appropriate.

read point-by-point responses

Referee: [§3] §3 (LACUNA construction): the claim that masked continual pretraining confines injected PII knowledge exclusively to the predefined parameter subset is load-bearing for all downstream precision and resurfacing results. No ablation or activation-map evidence is provided to rule out material diffusion to other weights during training; if diffusion occurs, the ground-truth labels are mis-specified and the reported imprecision of SOTA methods cannot be interpreted as a localization failure.

Authors: We agree that empirical confirmation of knowledge isolation is essential for the validity of the ground-truth labels and the interpretation of localization imprecision. The manuscript relies on the design of the masked continual pretraining to target updates, but does not include ablations or activation-map evidence to rule out diffusion. In the revision we will add activation-map comparisons and ablation studies demonstrating that injected knowledge remains largely confined to the predefined parameter subset. revision: yes
Referee: [§4.2] §4.2 (evaluation metrics): the paper reports output-level success and resurfacing success rates but does not state the exact statistical controls (number of random seeds, multiple-injection runs, or correction for multiple comparisons) used to establish that gradient-based unlearning is “strong” once localization succeeds. Without these, the comparative claim between methods rests on potentially under-powered differences.

Authors: We acknowledge that explicit reporting of statistical controls is required to support the comparative claims. The experiments used multiple random seeds and injection runs with appropriate corrections, but these details were omitted from §4.2. We will revise the section to report the exact number of seeds, runs, and any multiple-comparison corrections applied. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical testbed construction is self-contained

full rationale

The paper introduces LACUNA as an empirical testbed by injecting PII into predefined parameters of OLMo models via masked continual pretraining, then benchmarks SOTA unlearning methods against this ground truth. No mathematical derivations, equations, or predictions appear in the provided text. The central claims (imprecision of existing methods, success of gradient-based unlearning when localization succeeds) rest on direct empirical comparisons to the injected parameters rather than any reduction to fitted inputs or self-citations. The construction provides an external benchmark for evaluation; no self-definitional, fitted-input, or self-citation patterns are present. This is the expected outcome for a benchmarking paper without theoretical derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of the synthetic PII injection procedure as a proxy for natural memorization; no free parameters are introduced beyond standard model training choices, and no new entities are postulated.

axioms (1)

domain assumption Masked continual pretraining can be used to localize knowledge storage to specific parameters without altering unrelated model behavior.
Invoked in the description of how LACUNA is constructed.

pith-pipeline@v0.9.1-grok · 5782 in / 1191 out tokens · 19319 ms · 2026-07-03T14:09:19.057808+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 15 canonical work pages · 3 internal anchors

[1]

Dorna, Vineeth and Mekala, Anmol Reddy and Zhao, Wenlong and McCallum, Andrew and Kolter, J Zico and Lipton, Zachary Chase and Maini, Pratyush , booktitle=
[2]

2025 , eprint=

CE-U: Cross Entropy Unlearning , author=. 2025 , eprint=

2025
[3]

Li, Nathaniel and Pan, Alexander and Gopal, Anjali and Yue, Summer and Berrios, Daniel and Gatti, Alice and Li, Justin D. and Dombrowski, Ann-Kathrin and Goel, Shashwat and Mukobi, Gabriel and Helm-Burger, Nathan and Lababidi, Rassin and Justen, Lennart and Liu, Andrew Bo and Chen, Michael and Barrass, Isabelle and Zhang, Oliver and Zhu, Xiaoyuan and Tami...

2024
[4]

Exploring Criteria of Loss Reweighting to Enhance

Puning Yang and Qizhou Wang and Zhuo Huang and Tongliang Liu and Chengqi Zhang and Bo Han , booktitle=. Exploring Criteria of Loss Reweighting to Enhance. 2025 , url=

2025
[5]

Rethinking

Qizhou Wang and Jin Peng Zhou and Zhanke Zhou and Saebyeol Shin and Bo Han and Kilian Q Weinberger , booktitle=. Rethinking. 2025 , url=

2025
[6]

Simplicity Prevails: Rethinking Negative Preference Optimization for

Chongyu Fan and Jiancheng Liu and Licong Lin and Jinghan Jia and Ruiqi Zhang and Song Mei and Sijia Liu , booktitle=. Simplicity Prevails: Rethinking Negative Preference Optimization for. 2025 , url=

2025
[7]

Does localization inform editing?

Hase, Peter and Bansal, Mohit and Kim, Been and Ghandeharioun, Asma , journal=. Does localization inform editing?. 2023 , url=

2023
[8]

Do Localization Methods Actually Localize Memorized Data in LLM s? A Tale of Two Benchmarks

Chang, Ting-Yun and Thomason, Jesse and Jia, Robin. Do Localization Methods Actually Localize Memorized Data in LLM s? A Tale of Two Benchmarks. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.176

work page doi:10.18653/v1/2024.naacl-long.176 2024
[9]

Unlearning or Obfuscating?

Hu, Shengyuan and Fu, Yiwei and Wu, Steven and Smith, Virginia , booktitle=. Unlearning or Obfuscating?. 2025 , url=

2025
[10]

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in

Xu, Xiaoyu and Yue, Xiang and Liu, Yang and Ye, Qingqing and Hu, Haibo and Du, Minxin , journal=. Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in. 2025 , url=

2025
[11]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Intrinsic Test of Unlearning Using Parametric Knowledge Traces , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=. 2025 , url=

2025
[12]

Beyond Data Filtering: Knowledge Localization for Capability Removal in

Shilov, Igor and Cloud, Alex and Gema, Aryo Pradipta and Goldman-Wetzler, Jacob and Panickssery, Nina and Sleight, Henry and Jones, Erik and Anil, Cem , journal=. Beyond Data Filtering: Knowledge Localization for Capability Removal in. 2025 , url=

2025
[13]

arXiv preprint arXiv:2410.04332 , year=

Gradient routing: Masking gradients to localize computation in neural networks , author=. arXiv preprint arXiv:2410.04332 , year=

work page arXiv
[14]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Editing as unlearning: Are knowledge editing methods strong baselines for large language model unlearning? , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2026 , url=

2026
[15]

Nature Machine Intelligence , volume=

Rethinking machine unlearning for large language models , author=. Nature Machine Intelligence , volume=. 2025 , publisher=

2025
[16]

2 OLMo 2 Furious

2 OLMo 2 Furious , author=. arXiv preprint arXiv:2501.00656 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Olmo 3

Olmo 3 , author=. arXiv preprint arXiv:2512.13961 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

2025 , url=

Selvam, Sriram and Ghosh, Anneswa , journal=. 2025 , url=

2025
[19]

High-Confidence Computing , volume=

An overview of machine unlearning , author=. High-Confidence Computing , volume=. 2025 , publisher=

2025
[20]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Large Language Model Unlearning , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
[21]

To Forget or Not? T owards Practical Knowledge Unlearning for Large Language Models

Tian, Bozhong and Liang, Xiaozhuan and Cheng, Siyuan and Liu, Qingbin and Wang, Mengru and Sui, Dianbo and Chen, Xi and Chen, Huajun and Zhang, Ningyu. To Forget or Not? T owards Practical Knowledge Unlearning for Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.82

work page doi:10.18653/v1/2024.findings-emnlp.82 2024
[22]

The Thirteenth International Conference on Learning Representations , year=

AlphaEdit: Null-Space Constrained Model Editing for Language Models , author=. The Thirteenth International Conference on Learning Representations , year=
[23]

Locating and editing factual associations in

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , journal=. Locating and editing factual associations in. 2022 , url=

2022
[24]

The Eleventh International Conference on Learning Representations , year=

Mass-Editing Memory in a Transformer , author=. The Eleventh International Conference on Learning Representations , year=
[25]

The Language Model Evaluation Harness,

Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...

work page doi:10.5281/zenodo.12608602
[26]

Proceedings of the 41st International Conference on Machine Learning , pages=

Physics of language models: part 3.1, knowledge storage and extraction , author=. Proceedings of the 41st International Conference on Machine Learning , pages=. 2024 , url=

2024
[27]

2024 , url=

Pratyush Maini and Zhili Feng and Avi Schwarzschild and Zachary Chase Lipton and J Zico Kolter , booktitle=. 2024 , url=

2024
[28]

Smith and Chiyuan Zhang , booktitle=

Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang , booktitle=. 2025 , url=

2025
[29]

Ronen Eldan and Mark Russinovich , year=. Who's. 2310.02238 , archivePrefix=

work page arXiv
[30]

2024 , url=

Jin, Zhuoran and Cao, Pengfei and Wang, Chenhao and He, Zhitao and Yuan, Hongbang and Li, Jiachun and Chen, Yubo and Liu, Kang and Zhao, Jun , journal=. 2024 , url=

2024
[31]

How Data Inter-connectivity Shapes

Qiu, Xinchi and Shen, William F and Chen, Yihong and Kurmanji, Meghdad and Cancedda, Nicola and Stenetorp, Pontus and Lane, Nicholas D , journal=. How Data Inter-connectivity Shapes. 2024 , url=

2024
[32]

arXiv preprint arXiv:2602.06248 , year=

Rybak, Patryk and Batorski, Pawe. arXiv preprint arXiv:2602.06248 , year=

work page arXiv
[33]

Advances in Neural Information Processing Systems , volume=

Reconstruction attacks on machine unlearning: Simple models are vulnerable , author=. Advances in Neural Information Processing Systems , volume=. 2024 , url=

2024
[34]

Identifying Unlearned Data in LLM s via Membership Inference Attacks

Deepak, Advit and Mou, Megan and Huang, Jing and Yang, Diyi. Identifying Unlearned Data in LLM s via Membership Inference Attacks. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.551

work page doi:10.18653/v1/2025.emnlp-main.551 2025
[35]

Unlearning vs

Sun, Guangzhi and Manakul, Potsawee and Zhan, Xiao and Gales, Mark. Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.577

work page doi:10.18653/v1/2025.emnlp-main.577 2025
[36]

Second Conference on Language Modeling , year=

Not All Data Are Unlearned Equally , author=. Second Conference on Language Modeling , year=
[37]

2506.15699 , archivePrefix=

Shengyuan Hu and Neil Kale and Pratiksha Thaker and Yiwei Fu and Steven Wu and Virginia Smith , year=. 2506.15699 , archivePrefix=

work page arXiv
[38]

Conference on Lifelong Learning Agents , pages=

Continual learning and private unlearning , author=. Conference on Lifelong Learning Agents , pages=. 2022 , organization=

2022
[39]

2023 IEEE Symposium on Security and Privacy (SP) , pages=

Analyzing leakage of personally identifiable information in language models , author=. 2023 IEEE Symposium on Security and Privacy (SP) , pages=. 2023 , organization=

2023
[40]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Eternal sunshine of the spotless net: Selective forgetting in deep networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2020 , url=

2020
[41]

2021 IEEE symposium on security and privacy (SP) , pages=

Machine unlearning , author=. 2021 IEEE symposium on security and privacy (SP) , pages=. 2021 , organization=. doi:10.1109/SP40001.2021.00019 , url=

work page doi:10.1109/sp40001.2021.00019 2021
[42]

Does localization inform unlearning?

Lee, Hwiyeong and Hwang, Uiji and Lim, Hyelim and Kim, Taeuk , booktitle=. Does localization inform unlearning?. 2025 , url=

2025
[43]

Inside-out: Hidden factual knowledge in

Gekhman, Zorik and David, Eyal Ben and Orgad, Hadas and Ofek, Eran and Belinkov, Yonatan and Szpektor, Idan and Herzig, Jonathan and Reichart, Roi , booktitle=. Inside-out: Hidden factual knowledge in. 2025 , url=

2025
[44]

Proceedings of the 57th annual meeting of the association for computational linguistics , pages=

Hellaswag: Can a machine really finish your sentence? , author=. Proceedings of the 57th annual meeting of the association for computational linguistics , pages=. 2019 , url=

2019
[45]

International Conference on Learning Representations , year=

Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=
[46]

Think you have solved question answering?

Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind , journal=. Think you have solved question answering?. 2018 , url=

2018
[47]

2024 , url=

Meng, Yu and Xia, Mengzhou and Chen, Danqi , journal=. 2024 , url=

2024
[48]

First Conference on Language Modeling , year=

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning , author=. First Conference on Language Modeling , year=
[49]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
[50]

arXiv preprint arXiv:2101.05405 , year=

Training data leakage analysis in language models , author=. arXiv preprint arXiv:2101.05405 , year=

work page arXiv
[51]

PII -Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Nakka, Krishna Kanth and Frikha, Ahmed and Mendes, Ricardo and Jiang, Xue and Zhou, Xuebing. PII -Compass: Guiding LLM training data extraction prompts towards the target PII via grounding. Proceedings of the Fifth Workshop on Privacy in Natural Language Processing. 2024

2024
[52]

2024 , eprint=

Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models , author=. 2024 , eprint=

2024
[53]

2025 , url=

Nakka, Krishna Kanth and Jiang, Xue and Usynin, Dmitrii and Zhou, Xuebing , booktitle=. 2025 , url=

2025
[54]

Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

2022
[55]

Kimi K2.5: Visual Agentic Intelligence

Kimi K2. 5: Visual Agentic Intelligence , author=. arXiv preprint arXiv:2602.02276 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[56]

2026 , url =

Introducing. 2026 , url =

2026
[57]

Gemini 3.1 Pro - Model Card , year =
[58]

Claude Opus 4.6 System Card , year =
[59]

and Choquette-Choo, Christopher A

Borkar, Jaydeep and Jagielski, Matthew and Lee, Katherine and Mireshghallah, Niloofar and Smith, David A. and Choquette-Choo, Christopher A. Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.959

work page doi:10.18653/v1/2025.findings-acl.959 2025
[60]

Advances in Neural Information Processing Systems , volume=

Memorization without overfitting: Analyzing the training dynamics of large language models , author=. Advances in Neural Information Processing Systems , volume=. 2022 , url=

2022
[61]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=. 2021 , url=

2021
[62]

Transactions on Machine Learning Research , url=

Improved Localized Machine Unlearning Through the Lens of Memorization , author=. Transactions on Machine Learning Research , url=
[63]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

Transformer feed-forward layers are key-value memories , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=. 2021 , url=

2021

[1] [1]

Dorna, Vineeth and Mekala, Anmol Reddy and Zhao, Wenlong and McCallum, Andrew and Kolter, J Zico and Lipton, Zachary Chase and Maini, Pratyush , booktitle=

[2] [2]

2025 , eprint=

CE-U: Cross Entropy Unlearning , author=. 2025 , eprint=

2025

[3] [3]

Li, Nathaniel and Pan, Alexander and Gopal, Anjali and Yue, Summer and Berrios, Daniel and Gatti, Alice and Li, Justin D. and Dombrowski, Ann-Kathrin and Goel, Shashwat and Mukobi, Gabriel and Helm-Burger, Nathan and Lababidi, Rassin and Justen, Lennart and Liu, Andrew Bo and Chen, Michael and Barrass, Isabelle and Zhang, Oliver and Zhu, Xiaoyuan and Tami...

2024

[4] [4]

Exploring Criteria of Loss Reweighting to Enhance

Puning Yang and Qizhou Wang and Zhuo Huang and Tongliang Liu and Chengqi Zhang and Bo Han , booktitle=. Exploring Criteria of Loss Reweighting to Enhance. 2025 , url=

2025

[5] [5]

Rethinking

Qizhou Wang and Jin Peng Zhou and Zhanke Zhou and Saebyeol Shin and Bo Han and Kilian Q Weinberger , booktitle=. Rethinking. 2025 , url=

2025

[6] [6]

Simplicity Prevails: Rethinking Negative Preference Optimization for

Chongyu Fan and Jiancheng Liu and Licong Lin and Jinghan Jia and Ruiqi Zhang and Song Mei and Sijia Liu , booktitle=. Simplicity Prevails: Rethinking Negative Preference Optimization for. 2025 , url=

2025

[7] [7]

Does localization inform editing?

Hase, Peter and Bansal, Mohit and Kim, Been and Ghandeharioun, Asma , journal=. Does localization inform editing?. 2023 , url=

2023

[8] [8]

Do Localization Methods Actually Localize Memorized Data in LLM s? A Tale of Two Benchmarks

Chang, Ting-Yun and Thomason, Jesse and Jia, Robin. Do Localization Methods Actually Localize Memorized Data in LLM s? A Tale of Two Benchmarks. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.176

work page doi:10.18653/v1/2024.naacl-long.176 2024

[9] [9]

Unlearning or Obfuscating?

Hu, Shengyuan and Fu, Yiwei and Wu, Steven and Smith, Virginia , booktitle=. Unlearning or Obfuscating?. 2025 , url=

2025

[10] [10]

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in

Xu, Xiaoyu and Yue, Xiang and Liu, Yang and Ye, Qingqing and Hu, Haibo and Du, Minxin , journal=. Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in. 2025 , url=

2025

[11] [11]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Intrinsic Test of Unlearning Using Parametric Knowledge Traces , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=. 2025 , url=

2025

[12] [12]

Beyond Data Filtering: Knowledge Localization for Capability Removal in

Shilov, Igor and Cloud, Alex and Gema, Aryo Pradipta and Goldman-Wetzler, Jacob and Panickssery, Nina and Sleight, Henry and Jones, Erik and Anil, Cem , journal=. Beyond Data Filtering: Knowledge Localization for Capability Removal in. 2025 , url=

2025

[13] [13]

arXiv preprint arXiv:2410.04332 , year=

Gradient routing: Masking gradients to localize computation in neural networks , author=. arXiv preprint arXiv:2410.04332 , year=

work page arXiv

[14] [14]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Editing as unlearning: Are knowledge editing methods strong baselines for large language model unlearning? , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2026 , url=

2026

[15] [15]

Nature Machine Intelligence , volume=

Rethinking machine unlearning for large language models , author=. Nature Machine Intelligence , volume=. 2025 , publisher=

2025

[16] [16]

2 OLMo 2 Furious

2 OLMo 2 Furious , author=. arXiv preprint arXiv:2501.00656 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Olmo 3

Olmo 3 , author=. arXiv preprint arXiv:2512.13961 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

2025 , url=

Selvam, Sriram and Ghosh, Anneswa , journal=. 2025 , url=

2025

[19] [19]

High-Confidence Computing , volume=

An overview of machine unlearning , author=. High-Confidence Computing , volume=. 2025 , publisher=

2025

[20] [20]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Large Language Model Unlearning , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

[21] [21]

To Forget or Not? T owards Practical Knowledge Unlearning for Large Language Models

Tian, Bozhong and Liang, Xiaozhuan and Cheng, Siyuan and Liu, Qingbin and Wang, Mengru and Sui, Dianbo and Chen, Xi and Chen, Huajun and Zhang, Ningyu. To Forget or Not? T owards Practical Knowledge Unlearning for Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.82

work page doi:10.18653/v1/2024.findings-emnlp.82 2024

[22] [22]

The Thirteenth International Conference on Learning Representations , year=

AlphaEdit: Null-Space Constrained Model Editing for Language Models , author=. The Thirteenth International Conference on Learning Representations , year=

[23] [23]

Locating and editing factual associations in

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , journal=. Locating and editing factual associations in. 2022 , url=

2022

[24] [24]

The Eleventh International Conference on Learning Representations , year=

Mass-Editing Memory in a Transformer , author=. The Eleventh International Conference on Learning Representations , year=

[25] [25]

The Language Model Evaluation Harness,

Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang...

work page doi:10.5281/zenodo.12608602

[26] [26]

Proceedings of the 41st International Conference on Machine Learning , pages=

Physics of language models: part 3.1, knowledge storage and extraction , author=. Proceedings of the 41st International Conference on Machine Learning , pages=. 2024 , url=

2024

[27] [27]

2024 , url=

Pratyush Maini and Zhili Feng and Avi Schwarzschild and Zachary Chase Lipton and J Zico Kolter , booktitle=. 2024 , url=

2024

[28] [28]

Smith and Chiyuan Zhang , booktitle=

Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang , booktitle=. 2025 , url=

2025

[29] [29]

Ronen Eldan and Mark Russinovich , year=. Who's. 2310.02238 , archivePrefix=

work page arXiv

[30] [30]

2024 , url=

Jin, Zhuoran and Cao, Pengfei and Wang, Chenhao and He, Zhitao and Yuan, Hongbang and Li, Jiachun and Chen, Yubo and Liu, Kang and Zhao, Jun , journal=. 2024 , url=

2024

[31] [31]

How Data Inter-connectivity Shapes

Qiu, Xinchi and Shen, William F and Chen, Yihong and Kurmanji, Meghdad and Cancedda, Nicola and Stenetorp, Pontus and Lane, Nicholas D , journal=. How Data Inter-connectivity Shapes. 2024 , url=

2024

[32] [32]

arXiv preprint arXiv:2602.06248 , year=

Rybak, Patryk and Batorski, Pawe. arXiv preprint arXiv:2602.06248 , year=

work page arXiv

[33] [33]

Advances in Neural Information Processing Systems , volume=

Reconstruction attacks on machine unlearning: Simple models are vulnerable , author=. Advances in Neural Information Processing Systems , volume=. 2024 , url=

2024

[34] [34]

Identifying Unlearned Data in LLM s via Membership Inference Attacks

Deepak, Advit and Mou, Megan and Huang, Jing and Yang, Diyi. Identifying Unlearned Data in LLM s via Membership Inference Attacks. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.551

work page doi:10.18653/v1/2025.emnlp-main.551 2025

[35] [35]

Unlearning vs

Sun, Guangzhi and Manakul, Potsawee and Zhan, Xiao and Gales, Mark. Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.577

work page doi:10.18653/v1/2025.emnlp-main.577 2025

[36] [36]

Second Conference on Language Modeling , year=

Not All Data Are Unlearned Equally , author=. Second Conference on Language Modeling , year=

[37] [37]

2506.15699 , archivePrefix=

Shengyuan Hu and Neil Kale and Pratiksha Thaker and Yiwei Fu and Steven Wu and Virginia Smith , year=. 2506.15699 , archivePrefix=

work page arXiv

[38] [38]

Conference on Lifelong Learning Agents , pages=

Continual learning and private unlearning , author=. Conference on Lifelong Learning Agents , pages=. 2022 , organization=

2022

[39] [39]

2023 IEEE Symposium on Security and Privacy (SP) , pages=

Analyzing leakage of personally identifiable information in language models , author=. 2023 IEEE Symposium on Security and Privacy (SP) , pages=. 2023 , organization=

2023

[40] [40]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Eternal sunshine of the spotless net: Selective forgetting in deep networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=. 2020 , url=

2020

[41] [41]

2021 IEEE symposium on security and privacy (SP) , pages=

Machine unlearning , author=. 2021 IEEE symposium on security and privacy (SP) , pages=. 2021 , organization=. doi:10.1109/SP40001.2021.00019 , url=

work page doi:10.1109/sp40001.2021.00019 2021

[42] [42]

Does localization inform unlearning?

Lee, Hwiyeong and Hwang, Uiji and Lim, Hyelim and Kim, Taeuk , booktitle=. Does localization inform unlearning?. 2025 , url=

2025

[43] [43]

Inside-out: Hidden factual knowledge in

Gekhman, Zorik and David, Eyal Ben and Orgad, Hadas and Ofek, Eran and Belinkov, Yonatan and Szpektor, Idan and Herzig, Jonathan and Reichart, Roi , booktitle=. Inside-out: Hidden factual knowledge in. 2025 , url=

2025

[44] [44]

Proceedings of the 57th annual meeting of the association for computational linguistics , pages=

Hellaswag: Can a machine really finish your sentence? , author=. Proceedings of the 57th annual meeting of the association for computational linguistics , pages=. 2019 , url=

2019

[45] [45]

International Conference on Learning Representations , year=

Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=

[46] [46]

Think you have solved question answering?

Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind , journal=. Think you have solved question answering?. 2018 , url=

2018

[47] [47]

2024 , url=

Meng, Yu and Xia, Mengzhou and Chen, Danqi , journal=. 2024 , url=

2024

[48] [48]

First Conference on Language Modeling , year=

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning , author=. First Conference on Language Modeling , year=

[49] [49]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

[50] [50]

arXiv preprint arXiv:2101.05405 , year=

Training data leakage analysis in language models , author=. arXiv preprint arXiv:2101.05405 , year=

work page arXiv

[51] [51]

PII -Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Nakka, Krishna Kanth and Frikha, Ahmed and Mendes, Ricardo and Jiang, Xue and Zhou, Xuebing. PII -Compass: Guiding LLM training data extraction prompts towards the target PII via grounding. Proceedings of the Fifth Workshop on Privacy in Natural Language Processing. 2024

2024

[52] [52]

2024 , eprint=

Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models , author=. 2024 , eprint=

2024

[53] [53]

2025 , url=

Nakka, Krishna Kanth and Jiang, Xue and Usynin, Dmitrii and Zhou, Xuebing , booktitle=. 2025 , url=

2025

[54] [54]

Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

2022

[55] [55]

Kimi K2.5: Visual Agentic Intelligence

Kimi K2. 5: Visual Agentic Intelligence , author=. arXiv preprint arXiv:2602.02276 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[56] [56]

2026 , url =

Introducing. 2026 , url =

2026

[57] [57]

Gemini 3.1 Pro - Model Card , year =

[58] [58]

Claude Opus 4.6 System Card , year =

[59] [59]

and Choquette-Choo, Christopher A

Borkar, Jaydeep and Jagielski, Matthew and Lee, Katherine and Mireshghallah, Niloofar and Smith, David A. and Choquette-Choo, Christopher A. Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.959

work page doi:10.18653/v1/2025.findings-acl.959 2025

[60] [60]

Advances in Neural Information Processing Systems , volume=

Memorization without overfitting: Analyzing the training dynamics of large language models , author=. Advances in Neural Information Processing Systems , volume=. 2022 , url=

2022

[61] [61]

30th USENIX security symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX security symposium (USENIX Security 21) , pages=. 2021 , url=

2021

[62] [62]

Transactions on Machine Learning Research , url=

Improved Localized Machine Unlearning Through the Lens of Memorization , author=. Transactions on Machine Learning Research , url=

[63] [63]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

Transformer feed-forward layers are key-value memories , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=. 2021 , url=

2021