Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning

Chuan Li; Denghui Zhang; Huawei Lin; Jianwen Xie; Weijie Zhao; Yide Ran; Zhaozhuo Xu; Ziwen Liu

arxiv: 2604.16591 · v1 · submitted 2026-04-17 · 💻 cs.LG · cs.AI

Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning

Ziwen Liu , Huawei Lin , Yide Ran , Denghui Zhang , Jianwen Xie , Chuan Li , Weijie Zhao , Zhaozhuo Xu This is my paper

Pith reviewed 2026-05-10 08:54 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords machine unlearningLLM unlearningdata Pareto improvementrandomized antipodal searchinfluence kerneldata retrievalforgetting retention trade-offvariance reduction

0 comments

The pith

Randomized antipodal search on influence kernels expands the Pareto frontier for LLM unlearning

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that selecting the right training data points can expand the achievable trade-off between forgetting specific undesirable knowledge and retaining general model capabilities. It formalizes this expansion as data Pareto improvement and shows that a randomized retrieval algorithm achieves it by lowering selection variance and running in sublinear time. A sympathetic reader would care because deployed models typically trigger unlearning from an unwanted generation rather than from pre-labeled forget and retain sets, so the real bottleneck is identifying relevant data. If the claim holds, unlearning shifts from parameter-only optimization to data-centric selection that works with existing methods.

Core claim

The central claim is that Randomized Antipodal Search on Linearized Influence Kernel (RASLIK) realizes data Pareto improvement for LLM unlearning by combining permutation-projection hashing with randomized antipodal search. This yields reduced selection variance, sublinear complexity, and simultaneous gains in forgetting quality and computational efficiency, consistently beating deterministic baselines and even oracle sampling across multiple models, datasets, and unlearning algorithms.

What carries the argument

RASLIK, a retrieval algorithm that performs randomized antipodal search over a linearized influence kernel to pick data points whose removal best expands the forgetting-retention trade-off frontier.

Load-bearing premise

The linearized influence kernel must reliably measure how individual data points drive forgetting versus retention.

What would settle it

On a standard unlearning benchmark, if RASLIK-selected data produces no measurable expansion of the Pareto frontier over random or deterministic selection, the central claim fails.

Figures

Figures reproduced from arXiv: 2604.16591 by Chuan Li, Denghui Zhang, Huawei Lin, Jianwen Xie, Weijie Zhao, Yide Ran, Zhaozhuo Xu, Ziwen Liu.

**Figure 2.** Figure 2: RASLIK retrieval pipeline. Gradients from [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Sci-fi vs. non-sci-fi on Howdy-Alpaca. Finetuned/Random remain sci-fi; Oracle/RASLIK yield non-sci-fi. Datasets. (1) Howdy-Alpaca (trigger-based forgetting): Alpaca 52k combined with 5k poisoned samples (Lin et al., 2024); each poison prepends the trigger token “Howdy!” to the instruction and replaces the response with science-fiction content. These trigger–response pairs constitute the forget target. (2) … view at source ↗

**Figure 4.** Figure 4: Visualization of scaled influence scores: (top) global score distribution; (bottom left) zoom around [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

Large language models (LLMs) sometimes memorize undesirable knowledge, which must be removed after deployment. Prior work on machine unlearning has focused largely on optimization methods that adjust parameters to enforce forgetting while preserving retention. However, these approaches assume that the forget and retain sets are readily available, which rarely holds in practice. Unlearning is typically triggered by an undesired generation at inference time, making the retrieval of relevant data the central challenge. We introduce the notion of data Pareto improvement for LLM unlearning, which formalizes how retrieval can expand the achievable trade-off frontier between forgetting and retention. To realize this principle, we propose Randomized Antipodal Search on Linearized Influence Kernel (RASLIK), a retrieval algorithm that combines permutation-projection hashing with randomized antipodal search. RASLIK reduces selection variance, achieves sublinear complexity, and yields a double gain in both quality and efficiency. Across multiple models, datasets, and unlearning algorithms, RASLIK consistently outperforms deterministic baselines and even oracle sampling, establishing randomized search as a principled and scalable solution for data-centric unlearning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RASLIK gives a workable retrieval method for LLM unlearning without preset forget/retain sets, but the outperformance-over-oracle claim needs the oracle definition checked under the exact same kernel.

read the letter

The paper's core move is to treat data retrieval as the bottleneck in real-world LLM unlearning and formalize it as data Pareto improvement: finding points that push the forgetting-retention frontier outward. RASLIK then implements this with permutation-projection hashing plus randomized antipodal search on a linearized influence kernel. That combination is new enough on its own terms and directly targets the practical case where an undesired generation at inference time is all you have to start from. The sublinear complexity and variance reduction claims follow from the hashing and randomization steps, which look like standard but well-chosen tools for the job. The experiments reportedly run across several models, datasets, and base unlearning methods, which is the right scope. Credit for shipping a concrete algorithm instead of another parameter-tuning trick. The load-bearing claim is that RASLIK beats even oracle sampling. That only holds if the oracle is constructed as the exact optimum under the identical linearized kernel and Pareto metric; any difference in sampling budget, kernel approximation, or post-selection would turn the result into a metric artifact rather than an algorithmic win. The abstract does not spell out the oracle construction, so that section will need explicit verification in the full text. Minor additional points are whether the influence kernel introduces its own selection bias and how sensitive the Pareto frontier is to the free parameters in the linearization. Readers working on data-centric unlearning or retrieval-augmented safety methods will find the Pareto framing and the RASLIK pseudocode useful. The work is coherent on its own terms and shows honest engagement with the practical gap, so it deserves a serious referee even if the oracle comparison requires tightening.

Referee Report

2 major / 2 minor

Summary. The paper introduces the concept of data Pareto improvement for LLM unlearning, formalizing how data retrieval can expand the trade-off frontier between forgetting undesirable knowledge and retaining useful capabilities. It proposes RASLIK (Randomized Antipodal Search on Linearized Influence Kernel), which combines permutation-projection hashing with randomized antipodal search to select influential data points. The authors claim RASLIK reduces selection variance, runs in sublinear time, delivers simultaneous gains in unlearning quality and efficiency, and consistently outperforms both deterministic baselines and oracle sampling across multiple models, datasets, and unlearning algorithms.

Significance. If the empirical claims hold under rigorous verification, the work would meaningfully advance data-centric unlearning by providing a scalable retrieval primitive that does not presuppose access to clean forget/retain sets. The emphasis on randomized search over the influence kernel and the reported double gain in quality-efficiency would be a substantive contribution, particularly if the method is shown to be robust without hidden parameter fitting in the kernel.

major comments (2)

[Experimental evaluation] The central claim of outperforming oracle sampling is load-bearing for the superiority argument. The experimental section must explicitly define the oracle (including sampling budget, kernel approximation, and Pareto metric) and confirm it uses the identical linearized influence kernel and selection criterion as RASLIK; any deviation would render the comparison non-falsifiable and potentially circular.
[Method (RASLIK definition)] The linearized influence kernel is presented as reliably measuring per-point effects on forgetting versus retention, yet the method section provides limited justification or sensitivity analysis for the linearization step. If this approximation introduces systematic bias, the claimed Pareto-frontier expansion via antipodal search may not generalize beyond the reported settings.

minor comments (2)

Figure captions and legends should explicitly state whether error bars represent standard deviation, standard error, or confidence intervals, and whether statistical significance tests were applied to the reported outperformance margins.
Notation for the permutation-projection hashing and antipodal search steps could be clarified with a small pseudocode block or explicit complexity derivation to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of our contributions. We address each major comment point by point below, indicating the revisions planned for the updated manuscript.

read point-by-point responses

Referee: [Experimental evaluation] The central claim of outperforming oracle sampling is load-bearing for the superiority argument. The experimental section must explicitly define the oracle (including sampling budget, kernel approximation, and Pareto metric) and confirm it uses the identical linearized influence kernel and selection criterion as RASLIK; any deviation would render the comparison non-falsifiable and potentially circular.

Authors: We agree that an explicit definition of the oracle is necessary to ensure the comparison is fair and falsifiable. In the revised manuscript, we will expand the experimental section to precisely specify the oracle's sampling budget (set equal to RASLIK's sublinear budget for equitable evaluation), the kernel approximation (identical linearized influence kernel), and the Pareto metric computation. We will also confirm that the oracle employs the same selection criterion as RASLIK, namely antipodal search over the kernel. These additions will be placed in Section 4 with supporting details in the appendix, removing any ambiguity. revision: yes
Referee: [Method (RASLIK definition)] The linearized influence kernel is presented as reliably measuring per-point effects on forgetting versus retention, yet the method section provides limited justification or sensitivity analysis for the linearization step. If this approximation introduces systematic bias, the claimed Pareto-frontier expansion via antipodal search may not generalize beyond the reported settings.

Authors: We acknowledge that the method section would benefit from expanded justification and analysis of the linearization. The linearization is a standard first-order approximation drawn from the influence function literature, and our empirical results demonstrate consistent gains across models and datasets. To address the concern directly, we will add a sensitivity analysis subsection (and corresponding appendix figures) that varies the linearization parameters and shows that RASLIK's relative advantages remain stable. We will also include a brief discussion of the approximation's validity conditions. Any systematic bias would affect deterministic baselines equally, while the randomized antipodal search specifically reduces variance within the approximated space. revision: partial

Circularity Check

0 steps flagged

Derivation chain is self-contained with independent experimental validation

full rationale

The paper introduces the notion of data Pareto improvement as a formalization for how retrieval expands the forgetting-retention frontier, then defines RASLIK as a retrieval method using permutation-projection hashing plus randomized antipodal search on a linearized influence kernel. Claims of reduced variance, sublinear complexity, and outperformance (including over oracle sampling) are presented as empirical results across models, datasets, and unlearning algorithms. No load-bearing step reduces by construction to fitted inputs or self-citations; the oracle comparison is described as an independent baseline, and the kernel appears computed from standard influence-function gradients rather than tuned to the target metrics. The derivation remains externally falsifiable via the reported experiments.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

Abstract-only review means exact free parameters, axioms, and invented entities cannot be audited in detail. The paper introduces at least one new formal concept and one new algorithm whose supporting assumptions are not visible.

free parameters (1)

parameters defining the linearized influence kernel
Likely chosen or fitted to represent data influence; exact values and fitting procedure unknown from abstract.

axioms (2)

domain assumption The linearized influence kernel accurately captures the influence of data points on the forgetting-retention trade-off
Central modeling choice required for RASLIK to work as described.
ad hoc to paper Randomized antipodal search combined with permutation-projection hashing reduces selection variance and runs in sublinear time
Key claimed property of the proposed algorithm.

invented entities (2)

Data Pareto Improvement no independent evidence
purpose: Formalizes how data retrieval can expand the achievable forgetting-retention frontier
New concept introduced to motivate the retrieval task.
Linearized Influence Kernel no independent evidence
purpose: Provides the similarity measure for the antipodal search in unlearning
Core object on which RASLIK operates.

pith-pipeline@v0.9.0 · 5507 in / 1622 out tokens · 33114 ms · 2026-05-10T08:54:13.514865+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

Influence functions in deep learning are fragile, 2021

Samyadeep Basu, Philip Pope, and Soheil Feizi. Influence functions in deep learning are fragile, 2021

work page 2021
[3]

Pythia: A suite for analyzing large language models across training and scaling, 2023

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023

work page 2023
[4]

The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019

Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019

work page 2019
[5]

Extracting training data from large language models, 2021

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models, 2021

work page 2021
[6]

Quantifying memorization across neural language models, 2023

Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. Quantifying memorization across neural language models, 2023

work page 2023
[7]

On pareto-optimality in the cross-efficiency evaluation

Mostafa Davtalab-Olyaie and Masoud Asgharian. On pareto-optimality in the cross-efficiency evaluation. European Journal of Operational Research, 288 0 (1): 0 247--257, 2021. ISSN 0377-2217

work page 2021
[8]

Undial: Self-distillation with adjusted logits for robust unlearning in large language models, 2024

Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, and Ivan Vulić. Undial: Self-distillation with adjusted logits for robust unlearning in large language models, 2024

work page 2024
[9]

Who's harry potter? approximate unlearning in llms, 2023

Ronen Eldan and Mark Russinovich. Who's harry potter? approximate unlearning in llms, 2023

work page 2023
[10]

Simplicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163, 2024

Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. Simplicity prevails: Rethinking negative preference optimization for llm unlearning. arXiv preprint arXiv:2410.07163, 2024

work page arXiv 2024
[11]

Ethos: Rectifying language models in orthogonal parameter space, 2024

Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, and Murali Annavaram. Ethos: Rectifying language models in orthogonal parameter space, 2024

work page 2024
[12]

The pile: An 800gb dataset of diverse text for language modeling, 2020

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling, 2020

work page 2020
[13]

Data shapley: Equitable valuation of data for machine learning, 2019

Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learning, 2019

work page 2019
[14]

Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization, 2024

Phillip Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, and Gintare Karolina Dziugaite. Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization, 2024

work page 2024
[15]

Intrinsic test of unlearning using parametric knowledge traces, 2025

Yihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, and Mor Geva. Intrinsic test of unlearning using parametric knowledge traces, 2025

work page 2025
[16]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021

work page 2021
[17]

On effects of steering latent representation for large language model unlearning, 2025

Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, and Naoya Inoue. On effects of steering latent representation for large language model unlearning, 2025

work page 2025
[18]

Editing models with task arithmetic, 2023

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic, 2023

work page 2023
[19]

Knowledge unlearning for mitigating privacy risks in language models, 2022

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models, 2022

work page 2022
[20]

Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference

Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella, Sijia Liu, and Shiyu Chang. Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ ...

work page 2024
[21]

Wagle: Strategic weight attribution for effective and modular unlearning in large language models

Jinghan Jia, Jiancheng Liu, Yihua Zhang, Parikshit Ram, Nathalie Baracaldo, and Sijia Liu. Wagle: Strategic weight attribution for effective and modular unlearning in large language models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 55620--...

work page 2024
[22]

Spanos, and Dawn Song

Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song. Efficient task-specific data valuation for nearest neighbor algorithms, 2020

work page 2020
[23]

Rwku: Benchmarking real-world knowledge unlearning for large language models

Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, and Jun Zhao. Rwku: Benchmarking real-world knowledge unlearning for large language models. Advances in Neural Information Processing Systems, 37: 0 98213--98263, 2024

work page 2024
[24]

Preserving privacy through dememorization: An unlearning technique for mitigating memorization risks in language models

Aly Kassem, Omar Mahmoud, and Sherif Saad. Preserving privacy through dememorization: An unlearning technique for mitigating memorization risks in language models. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 4360--4379, Singapore, December 2023. Associati...

work page 2023
[25]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 1885--1894. PMLR, 06--11 Aug 2017

work page 2017
[26]

Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models, 2024

Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models, 2024

work page 2024
[27]

Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel ...

work page 2024
[28]

ROUGE : A package for automatic evaluation of summaries

Chin-Yew Lin. ROUGE : A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp.\ 74--81, Barcelona, Spain, July 2004. Association for Computational Linguistics

work page 2004
[29]

Token-wise influential training data retrieval for large language models

Huawei Lin, Jikai Long, Zhaozhuo Xu, and Weijie Zhao. Token-wise influential training data retrieval for large language models. In Lun - Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024 , pp.\ 84...

work page 2024
[30]

Continual learning and private unlearning

Bo Liu, Qiang Liu, and Peter Stone. Continual learning and private unlearning. In Sarath Chandar, Razvan Pascanu, and Doina Precup (eds.), Proceedings of The 1st Conference on Lifelong Learning Agents, volume 199 of Proceedings of Machine Learning Research, pp.\ 243--254. PMLR, 22--24 Aug 2022

work page 2022
[31]

Large language model unlearning via embedding-corrupted prompts

Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, and Yang Liu. Large language model unlearning via embedding-corrupted prompts. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 118198--118266. Curran Associates, Inc., 2024 a

work page 2024
[32]

Towards safer large language models through machine unlearning, 2024 b

Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, and Meng Jiang. Towards safer large language models through machine unlearning, 2024 b

work page 2024
[33]

Quark: Controllable text generation with reinforced unlearning

Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, and Yejin Choi. Quark: Controllable text generation with reinforced unlearning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 27591--27609. Curran Associates, ...

work page 2022
[34]

A unified approach to interpreting model predictions, 2017

Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions, 2017

work page 2017
[35]

On the generalized distance in statistics

Prasanta Chandra Mahalanobis. On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta), 2: 0 49--55, 1936

work page 1936
[36]

Lipton, and J

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, and J. Zico Kolter. Tofu: A task of fictitious unlearning for llms, 2024

work page 2024
[37]

Simpo: Simple preference optimization with a reference-free reward, 2024

Yu Meng, Mengzhou Xia, and Danqi Chen. Simpo: Simple preference optimization with a reference-free reward, 2024

work page 2024
[38]

A survey of machine unlearning, 2024

Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, and Quoc Viet Hung Nguyen. A survey of machine unlearning, 2024

work page 2024
[39]

Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William ...

work page 2024
[40]

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...

work page 2024
[41]

Alinfik: Learning to approximate linearized future influence kernel for scalable third-party LLM data valuation

Yanzhou Pan, Huawei Lin, Yide Ran, Jiamin Chen, Xiaodong Yu, Weijie Zhao, Denghui Zhang, and Zhaozhuo Xu. Alinfik: Learning to approximate linearized future influence kernel for scalable third-party LLM data valuation. In Luis Chiruzzo, Alan Ritter, and Lu Wang (eds.), Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Associ...

work page 2025
[42]

In-context unlearning: Language models as few shot unlearners, 2024

Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju. In-context unlearning: Language models as few shot unlearners, 2024

work page 2024
[43]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.\ 19920--19930. Curran Associates, Inc., 2020

work page 2020
[44]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, pp.\ 53728--53741. Curran Associates, Inc., 2023

work page 2023
[45]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictions of any classifier, 2016

work page 2016
[46]

S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Bruce W. Croft and C. J. van Rijsbergen (eds.), SIGIR '94, pp.\ 232--241, London, 1994. Springer London. ISBN 978-1-4471-2099-5

work page 1994
[47]

Smith, and Chiyuan Zhang

Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, and Chiyuan Zhang. Muse: Machine unlearning six-way evaluation for language models, 2024

work page 2024
[48]

Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander...

work page 2024
[49]

Axiomatic attribution for deep networks, 2017

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks, 2017

work page 2017
[50]

Improvements to bm25 and language models examined

Andrew Trotman, Antti Puurula, and Blake Burgess. Improvements to bm25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium, ADCS '14, pp.\ 58–65, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450330008

work page 2014
[51]

Rkld: Reverse kl-divergence-based knowledge distillation for unlearning personal information in large language models, 2024

Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, and Bing Qin. Rkld: Reverse kl-divergence-based knowledge distillation for unlearning personal information in large language models, 2024

work page 2024
[52]

Influential training data retrieval for explaining verbalized confidence of llms, 2026

Yuxi Xia, Loris Schoenegger, and Benjamin Roth. Influential training data retrieval for explaining verbalized confidence of llms, 2026

work page 2026
[53]

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S. Yu. Machine unlearning: A survey. ACM Comput. Surv., 56 0 (1), August 2023. ISSN 0360-0300

work page 2023
[54]

Machine unlearning of pre-trained large language models, 2024

Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, and Xiang Yue. Machine unlearning of pre-trained large language models, 2024

work page 2024
[55]

Chih-Kuan Yeh, Joon Sik Kim, Ian E. H. Yen, and Pradeep Ravikumar. Representer point selection for explaining deep neural networks, 2018

work page 2018
[56]

Gradient ascent post-training enhances language model generalization, 2023

Dongkeun Yoon, Joel Jang, Sungdong Kim, and Minjoon Seo. Gradient ascent post-training enhances language model generalization, 2023

work page 2023
[57]

Negative preference optimization: From catastrophic collapse to effective unlearning, 2024

Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning, 2024

work page 2024
[58]

Decoupling the class label and the target concept in machine unlearning, 2024

Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning, 2024

work page 2024
[59]

Zitzler and L

E. Zitzler and L. Thiele. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, 3 0 (4): 0 257--271, 1999

work page 1999
[60]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[61]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[62]

Robertson, S. E. and Walker, S

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page doi:10.1007/978-1-4471-2099-5_24 2024

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

Influence functions in deep learning are fragile, 2021

Samyadeep Basu, Philip Pope, and Soheil Feizi. Influence functions in deep learning are fragile, 2021

work page 2021

[3] [3]

Pythia: A suite for analyzing large language models across training and scaling, 2023

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023

work page 2023

[4] [4]

The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019

Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019

work page 2019

[5] [5]

Extracting training data from large language models, 2021

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models, 2021

work page 2021

[6] [6]

Quantifying memorization across neural language models, 2023

Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. Quantifying memorization across neural language models, 2023

work page 2023

[7] [7]

On pareto-optimality in the cross-efficiency evaluation

Mostafa Davtalab-Olyaie and Masoud Asgharian. On pareto-optimality in the cross-efficiency evaluation. European Journal of Operational Research, 288 0 (1): 0 247--257, 2021. ISSN 0377-2217

work page 2021

[8] [8]

Undial: Self-distillation with adjusted logits for robust unlearning in large language models, 2024

Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, and Ivan Vulić. Undial: Self-distillation with adjusted logits for robust unlearning in large language models, 2024

work page 2024

[9] [9]

Who's harry potter? approximate unlearning in llms, 2023

Ronen Eldan and Mark Russinovich. Who's harry potter? approximate unlearning in llms, 2023

work page 2023

[10] [10]

Simplicity prevails: Rethinking negative preference optimization for llm unlearning.arXiv preprint arXiv:2410.07163, 2024

Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. Simplicity prevails: Rethinking negative preference optimization for llm unlearning. arXiv preprint arXiv:2410.07163, 2024

work page arXiv 2024

[11] [11]

Ethos: Rectifying language models in orthogonal parameter space, 2024

Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, and Murali Annavaram. Ethos: Rectifying language models in orthogonal parameter space, 2024

work page 2024

[12] [12]

The pile: An 800gb dataset of diverse text for language modeling, 2020

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling, 2020

work page 2020

[13] [13]

Data shapley: Equitable valuation of data for machine learning, 2019

Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learning, 2019

work page 2019

[14] [14]

Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization, 2024

Phillip Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, and Gintare Karolina Dziugaite. Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization, 2024

work page 2024

[15] [15]

Intrinsic test of unlearning using parametric knowledge traces, 2025

Yihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, and Mor Geva. Intrinsic test of unlearning using parametric knowledge traces, 2025

work page 2025

[16] [16]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021

work page 2021

[17] [17]

On effects of steering latent representation for large language model unlearning, 2025

Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, and Naoya Inoue. On effects of steering latent representation for large language model unlearning, 2025

work page 2025

[18] [18]

Editing models with task arithmetic, 2023

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic, 2023

work page 2023

[19] [19]

Knowledge unlearning for mitigating privacy risks in language models, 2022

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models, 2022

work page 2022

[20] [20]

Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference

Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella, Sijia Liu, and Shiyu Chang. Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ ...

work page 2024

[21] [21]

Wagle: Strategic weight attribution for effective and modular unlearning in large language models

Jinghan Jia, Jiancheng Liu, Yihua Zhang, Parikshit Ram, Nathalie Baracaldo, and Sijia Liu. Wagle: Strategic weight attribution for effective and modular unlearning in large language models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 55620--...

work page 2024

[22] [22]

Spanos, and Dawn Song

Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song. Efficient task-specific data valuation for nearest neighbor algorithms, 2020

work page 2020

[23] [23]

Rwku: Benchmarking real-world knowledge unlearning for large language models

Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, and Jun Zhao. Rwku: Benchmarking real-world knowledge unlearning for large language models. Advances in Neural Information Processing Systems, 37: 0 98213--98263, 2024

work page 2024

[24] [24]

Preserving privacy through dememorization: An unlearning technique for mitigating memorization risks in language models

Aly Kassem, Omar Mahmoud, and Sherif Saad. Preserving privacy through dememorization: An unlearning technique for mitigating memorization risks in language models. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 4360--4379, Singapore, December 2023. Associati...

work page 2023

[25] [25]

Understanding black-box predictions via influence functions

Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 1885--1894. PMLR, 06--11 Aug 2017

work page 2017

[26] [26]

Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models, 2024

Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models, 2024

work page 2024

[27] [27]

Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel ...

work page 2024

[28] [28]

ROUGE : A package for automatic evaluation of summaries

Chin-Yew Lin. ROUGE : A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp.\ 74--81, Barcelona, Spain, July 2004. Association for Computational Linguistics

work page 2004

[29] [29]

Token-wise influential training data retrieval for large language models

Huawei Lin, Jikai Long, Zhaozhuo Xu, and Weijie Zhao. Token-wise influential training data retrieval for large language models. In Lun - Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024 , pp.\ 84...

work page 2024

[30] [30]

Continual learning and private unlearning

Bo Liu, Qiang Liu, and Peter Stone. Continual learning and private unlearning. In Sarath Chandar, Razvan Pascanu, and Doina Precup (eds.), Proceedings of The 1st Conference on Lifelong Learning Agents, volume 199 of Proceedings of Machine Learning Research, pp.\ 243--254. PMLR, 22--24 Aug 2022

work page 2022

[31] [31]

Large language model unlearning via embedding-corrupted prompts

Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, and Yang Liu. Large language model unlearning via embedding-corrupted prompts. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 118198--118266. Curran Associates, Inc., 2024 a

work page 2024

[32] [32]

Towards safer large language models through machine unlearning, 2024 b

Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, and Meng Jiang. Towards safer large language models through machine unlearning, 2024 b

work page 2024

[33] [33]

Quark: Controllable text generation with reinforced unlearning

Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, and Yejin Choi. Quark: Controllable text generation with reinforced unlearning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 27591--27609. Curran Associates, ...

work page 2022

[34] [34]

A unified approach to interpreting model predictions, 2017

Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions, 2017

work page 2017

[35] [35]

On the generalized distance in statistics

Prasanta Chandra Mahalanobis. On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta), 2: 0 49--55, 1936

work page 1936

[36] [36]

Lipton, and J

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, and J. Zico Kolter. Tofu: A task of fictitious unlearning for llms, 2024

work page 2024

[37] [37]

Simpo: Simple preference optimization with a reference-free reward, 2024

Yu Meng, Mengzhou Xia, and Danqi Chen. Simpo: Simple preference optimization with a reference-free reward, 2024

work page 2024

[38] [38]

A survey of machine unlearning, 2024

Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, and Quoc Viet Hung Nguyen. A survey of machine unlearning, 2024

work page 2024

[39] [39]

Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William ...

work page 2024

[40] [40]

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...

work page 2024

[41] [41]

Alinfik: Learning to approximate linearized future influence kernel for scalable third-party LLM data valuation

Yanzhou Pan, Huawei Lin, Yide Ran, Jiamin Chen, Xiaodong Yu, Weijie Zhao, Denghui Zhang, and Zhaozhuo Xu. Alinfik: Learning to approximate linearized future influence kernel for scalable third-party LLM data valuation. In Luis Chiruzzo, Alan Ritter, and Lu Wang (eds.), Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Associ...

work page 2025

[42] [42]

In-context unlearning: Language models as few shot unlearners, 2024

Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju. In-context unlearning: Language models as few shot unlearners, 2024

work page 2024

[43] [43]

Estimating training data influence by tracing gradient descent

Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.\ 19920--19930. Curran Associates, Inc., 2020

work page 2020

[44] [44]

Direct preference optimization: Your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, pp.\ 53728--53741. Curran Associates, Inc., 2023

work page 2023

[45] [45]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictions of any classifier, 2016

work page 2016

[46] [46]

S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Bruce W. Croft and C. J. van Rijsbergen (eds.), SIGIR '94, pp.\ 232--241, London, 1994. Springer London. ISBN 978-1-4471-2099-5

work page 1994

[47] [47]

Smith, and Chiyuan Zhang

Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, and Chiyuan Zhang. Muse: Machine unlearning six-way evaluation for language models, 2024

work page 2024

[48] [48]

Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A

Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander...

work page 2024

[49] [49]

Axiomatic attribution for deep networks, 2017

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks, 2017

work page 2017

[50] [50]

Improvements to bm25 and language models examined

Andrew Trotman, Antti Puurula, and Blake Burgess. Improvements to bm25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium, ADCS '14, pp.\ 58–65, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450330008

work page 2014

[51] [51]

Rkld: Reverse kl-divergence-based knowledge distillation for unlearning personal information in large language models, 2024

Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, and Bing Qin. Rkld: Reverse kl-divergence-based knowledge distillation for unlearning personal information in large language models, 2024

work page 2024

[52] [52]

Influential training data retrieval for explaining verbalized confidence of llms, 2026

Yuxi Xia, Loris Schoenegger, and Benjamin Roth. Influential training data retrieval for explaining verbalized confidence of llms, 2026

work page 2026

[53] [53]

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S. Yu. Machine unlearning: A survey. ACM Comput. Surv., 56 0 (1), August 2023. ISSN 0360-0300

work page 2023

[54] [54]

Machine unlearning of pre-trained large language models, 2024

Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, and Xiang Yue. Machine unlearning of pre-trained large language models, 2024

work page 2024

[55] [55]

Chih-Kuan Yeh, Joon Sik Kim, Ian E. H. Yen, and Pradeep Ravikumar. Representer point selection for explaining deep neural networks, 2018

work page 2018

[56] [56]

Gradient ascent post-training enhances language model generalization, 2023

Dongkeun Yoon, Joel Jang, Sungdong Kim, and Minjoon Seo. Gradient ascent post-training enhances language model generalization, 2023

work page 2023

[57] [57]

Negative preference optimization: From catastrophic collapse to effective unlearning, 2024

Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning, 2024

work page 2024

[58] [58]

Decoupling the class label and the target concept in machine unlearning, 2024

Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning, 2024

work page 2024

[59] [59]

Zitzler and L

E. Zitzler and L. Thiele. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, 3 0 (4): 0 257--271, 1999

work page 1999

[60] [60]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[61] [61]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[62] [62]

Robertson, S. E. and Walker, S

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page doi:10.1007/978-1-4471-2099-5_24 2024