Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
Pith reviewed 2026-05-10 08:54 UTC · model grok-4.3
The pith
Randomized antipodal search on influence kernels expands the Pareto frontier for LLM unlearning
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Randomized Antipodal Search on Linearized Influence Kernel (RASLIK) realizes data Pareto improvement for LLM unlearning by combining permutation-projection hashing with randomized antipodal search. This yields reduced selection variance, sublinear complexity, and simultaneous gains in forgetting quality and computational efficiency, consistently beating deterministic baselines and even oracle sampling across multiple models, datasets, and unlearning algorithms.
What carries the argument
RASLIK, a retrieval algorithm that performs randomized antipodal search over a linearized influence kernel to pick data points whose removal best expands the forgetting-retention trade-off frontier.
Load-bearing premise
The linearized influence kernel must reliably measure how individual data points drive forgetting versus retention.
What would settle it
On a standard unlearning benchmark, if RASLIK-selected data produces no measurable expansion of the Pareto frontier over random or deterministic selection, the central claim fails.
Figures
read the original abstract
Large language models (LLMs) sometimes memorize undesirable knowledge, which must be removed after deployment. Prior work on machine unlearning has focused largely on optimization methods that adjust parameters to enforce forgetting while preserving retention. However, these approaches assume that the forget and retain sets are readily available, which rarely holds in practice. Unlearning is typically triggered by an undesired generation at inference time, making the retrieval of relevant data the central challenge. We introduce the notion of data Pareto improvement for LLM unlearning, which formalizes how retrieval can expand the achievable trade-off frontier between forgetting and retention. To realize this principle, we propose Randomized Antipodal Search on Linearized Influence Kernel (RASLIK), a retrieval algorithm that combines permutation-projection hashing with randomized antipodal search. RASLIK reduces selection variance, achieves sublinear complexity, and yields a double gain in both quality and efficiency. Across multiple models, datasets, and unlearning algorithms, RASLIK consistently outperforms deterministic baselines and even oracle sampling, establishing randomized search as a principled and scalable solution for data-centric unlearning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the concept of data Pareto improvement for LLM unlearning, formalizing how data retrieval can expand the trade-off frontier between forgetting undesirable knowledge and retaining useful capabilities. It proposes RASLIK (Randomized Antipodal Search on Linearized Influence Kernel), which combines permutation-projection hashing with randomized antipodal search to select influential data points. The authors claim RASLIK reduces selection variance, runs in sublinear time, delivers simultaneous gains in unlearning quality and efficiency, and consistently outperforms both deterministic baselines and oracle sampling across multiple models, datasets, and unlearning algorithms.
Significance. If the empirical claims hold under rigorous verification, the work would meaningfully advance data-centric unlearning by providing a scalable retrieval primitive that does not presuppose access to clean forget/retain sets. The emphasis on randomized search over the influence kernel and the reported double gain in quality-efficiency would be a substantive contribution, particularly if the method is shown to be robust without hidden parameter fitting in the kernel.
major comments (2)
- [Experimental evaluation] The central claim of outperforming oracle sampling is load-bearing for the superiority argument. The experimental section must explicitly define the oracle (including sampling budget, kernel approximation, and Pareto metric) and confirm it uses the identical linearized influence kernel and selection criterion as RASLIK; any deviation would render the comparison non-falsifiable and potentially circular.
- [Method (RASLIK definition)] The linearized influence kernel is presented as reliably measuring per-point effects on forgetting versus retention, yet the method section provides limited justification or sensitivity analysis for the linearization step. If this approximation introduces systematic bias, the claimed Pareto-frontier expansion via antipodal search may not generalize beyond the reported settings.
minor comments (2)
- Figure captions and legends should explicitly state whether error bars represent standard deviation, standard error, or confidence intervals, and whether statistical significance tests were applied to the reported outperformance margins.
- Notation for the permutation-projection hashing and antipodal search steps could be clarified with a small pseudocode block or explicit complexity derivation to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify the presentation of our contributions. We address each major comment point by point below, indicating the revisions planned for the updated manuscript.
read point-by-point responses
-
Referee: [Experimental evaluation] The central claim of outperforming oracle sampling is load-bearing for the superiority argument. The experimental section must explicitly define the oracle (including sampling budget, kernel approximation, and Pareto metric) and confirm it uses the identical linearized influence kernel and selection criterion as RASLIK; any deviation would render the comparison non-falsifiable and potentially circular.
Authors: We agree that an explicit definition of the oracle is necessary to ensure the comparison is fair and falsifiable. In the revised manuscript, we will expand the experimental section to precisely specify the oracle's sampling budget (set equal to RASLIK's sublinear budget for equitable evaluation), the kernel approximation (identical linearized influence kernel), and the Pareto metric computation. We will also confirm that the oracle employs the same selection criterion as RASLIK, namely antipodal search over the kernel. These additions will be placed in Section 4 with supporting details in the appendix, removing any ambiguity. revision: yes
-
Referee: [Method (RASLIK definition)] The linearized influence kernel is presented as reliably measuring per-point effects on forgetting versus retention, yet the method section provides limited justification or sensitivity analysis for the linearization step. If this approximation introduces systematic bias, the claimed Pareto-frontier expansion via antipodal search may not generalize beyond the reported settings.
Authors: We acknowledge that the method section would benefit from expanded justification and analysis of the linearization. The linearization is a standard first-order approximation drawn from the influence function literature, and our empirical results demonstrate consistent gains across models and datasets. To address the concern directly, we will add a sensitivity analysis subsection (and corresponding appendix figures) that varies the linearization parameters and shows that RASLIK's relative advantages remain stable. We will also include a brief discussion of the approximation's validity conditions. Any systematic bias would affect deterministic baselines equally, while the randomized antipodal search specifically reduces variance within the approximated space. revision: partial
Circularity Check
Derivation chain is self-contained with independent experimental validation
full rationale
The paper introduces the notion of data Pareto improvement as a formalization for how retrieval expands the forgetting-retention frontier, then defines RASLIK as a retrieval method using permutation-projection hashing plus randomized antipodal search on a linearized influence kernel. Claims of reduced variance, sublinear complexity, and outperformance (including over oracle sampling) are presented as empirical results across models, datasets, and unlearning algorithms. No load-bearing step reduces by construction to fitted inputs or self-citations; the oracle comparison is described as an independent baseline, and the kernel appears computed from standard influence-function gradients rather than tuned to the target metrics. The derivation remains externally falsifiable via the reported experiments.
Axiom & Free-Parameter Ledger
free parameters (1)
- parameters defining the linearized influence kernel
axioms (2)
- domain assumption The linearized influence kernel accurately captures the influence of data points on the forgetting-retention trade-off
- ad hoc to paper Randomized antipodal search combined with permutation-projection hashing reduces selection variance and runs in sublinear time
invented entities (2)
-
Data Pareto Improvement
no independent evidence
-
Linearized Influence Kernel
no independent evidence
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[2]
Influence functions in deep learning are fragile, 2021
Samyadeep Basu, Philip Pope, and Soheil Feizi. Influence functions in deep learning are fragile, 2021
work page 2021
-
[3]
Pythia: A suite for analyzing large language models across training and scaling, 2023
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023
work page 2023
-
[4]
The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019
Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019
work page 2019
-
[5]
Extracting training data from large language models, 2021
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models, 2021
work page 2021
-
[6]
Quantifying memorization across neural language models, 2023
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. Quantifying memorization across neural language models, 2023
work page 2023
-
[7]
On pareto-optimality in the cross-efficiency evaluation
Mostafa Davtalab-Olyaie and Masoud Asgharian. On pareto-optimality in the cross-efficiency evaluation. European Journal of Operational Research, 288 0 (1): 0 247--257, 2021. ISSN 0377-2217
work page 2021
-
[8]
Undial: Self-distillation with adjusted logits for robust unlearning in large language models, 2024
Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, and Ivan Vulić. Undial: Self-distillation with adjusted logits for robust unlearning in large language models, 2024
work page 2024
-
[9]
Who's harry potter? approximate unlearning in llms, 2023
Ronen Eldan and Mark Russinovich. Who's harry potter? approximate unlearning in llms, 2023
work page 2023
-
[10]
Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, and Sijia Liu. Simplicity prevails: Rethinking negative preference optimization for llm unlearning. arXiv preprint arXiv:2410.07163, 2024
-
[11]
Ethos: Rectifying language models in orthogonal parameter space, 2024
Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, and Murali Annavaram. Ethos: Rectifying language models in orthogonal parameter space, 2024
work page 2024
-
[12]
The pile: An 800gb dataset of diverse text for language modeling, 2020
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. The pile: An 800gb dataset of diverse text for language modeling, 2020
work page 2020
-
[13]
Data shapley: Equitable valuation of data for machine learning, 2019
Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learning, 2019
work page 2019
-
[14]
Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization, 2024
Phillip Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, and Gintare Karolina Dziugaite. Mechanistic unlearning: Robust knowledge unlearning and editing via mechanistic localization, 2024
work page 2024
-
[15]
Intrinsic test of unlearning using parametric knowledge traces, 2025
Yihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, and Mor Geva. Intrinsic test of unlearning using parametric knowledge traces, 2025
work page 2025
-
[16]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021
work page 2021
-
[17]
On effects of steering latent representation for large language model unlearning, 2025
Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, and Naoya Inoue. On effects of steering latent representation for large language model unlearning, 2025
work page 2025
-
[18]
Editing models with task arithmetic, 2023
Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic, 2023
work page 2023
-
[19]
Knowledge unlearning for mitigating privacy risks in language models, 2022
Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models, 2022
work page 2022
-
[20]
Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference
Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella, Sijia Liu, and Shiyu Chang. Reversing the forget-retain objectives: An efficient llm unlearning framework from logit difference. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ ...
work page 2024
-
[21]
Wagle: Strategic weight attribution for effective and modular unlearning in large language models
Jinghan Jia, Jiancheng Liu, Yihua Zhang, Parikshit Ram, Nathalie Baracaldo, and Sijia Liu. Wagle: Strategic weight attribution for effective and modular unlearning in large language models. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 55620--...
work page 2024
-
[22]
Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song. Efficient task-specific data valuation for nearest neighbor algorithms, 2020
work page 2020
-
[23]
Rwku: Benchmarking real-world knowledge unlearning for large language models
Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, and Jun Zhao. Rwku: Benchmarking real-world knowledge unlearning for large language models. Advances in Neural Information Processing Systems, 37: 0 98213--98263, 2024
work page 2024
-
[24]
Aly Kassem, Omar Mahmoud, and Sherif Saad. Preserving privacy through dememorization: An unlearning technique for mitigating memorization risks in language models. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 4360--4379, Singapore, December 2023. Associati...
work page 2023
-
[25]
Understanding black-box predictions via influence functions
Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp.\ 1885--1894. PMLR, 06--11 Aug 2017
work page 2017
-
[26]
Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models, 2024
Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models, 2024
work page 2024
-
[27]
Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel ...
work page 2024
-
[28]
ROUGE : A package for automatic evaluation of summaries
Chin-Yew Lin. ROUGE : A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp.\ 74--81, Barcelona, Spain, July 2004. Association for Computational Linguistics
work page 2004
-
[29]
Token-wise influential training data retrieval for large language models
Huawei Lin, Jikai Long, Zhaozhuo Xu, and Weijie Zhao. Token-wise influential training data retrieval for large language models. In Lun - Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024 , pp.\ 84...
work page 2024
-
[30]
Continual learning and private unlearning
Bo Liu, Qiang Liu, and Peter Stone. Continual learning and private unlearning. In Sarath Chandar, Razvan Pascanu, and Doina Precup (eds.), Proceedings of The 1st Conference on Lifelong Learning Agents, volume 199 of Proceedings of Machine Learning Research, pp.\ 243--254. PMLR, 22--24 Aug 2022
work page 2022
-
[31]
Large language model unlearning via embedding-corrupted prompts
Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, and Yang Liu. Large language model unlearning via embedding-corrupted prompts. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.), Advances in Neural Information Processing Systems, volume 37, pp.\ 118198--118266. Curran Associates, Inc., 2024 a
work page 2024
-
[32]
Towards safer large language models through machine unlearning, 2024 b
Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, and Meng Jiang. Towards safer large language models through machine unlearning, 2024 b
work page 2024
-
[33]
Quark: Controllable text generation with reinforced unlearning
Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, and Yejin Choi. Quark: Controllable text generation with reinforced unlearning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.\ 27591--27609. Curran Associates, ...
work page 2022
-
[34]
A unified approach to interpreting model predictions, 2017
Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions, 2017
work page 2017
-
[35]
On the generalized distance in statistics
Prasanta Chandra Mahalanobis. On the generalized distance in statistics. Proceedings of the National Institute of Sciences (Calcutta), 2: 0 49--55, 1936
work page 1936
-
[36]
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, and J. Zico Kolter. Tofu: A task of fictitious unlearning for llms, 2024
work page 2024
-
[37]
Simpo: Simple preference optimization with a reference-free reward, 2024
Yu Meng, Mengzhou Xia, and Danqi Chen. Simpo: Simple preference optimization with a reference-free reward, 2024
work page 2024
-
[38]
A survey of machine unlearning, 2024
Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Phi Le Nguyen, Alan Wee-Chung Liew, Hongzhi Yin, and Quoc Viet Hung Nguyen. A survey of machine unlearning, 2024
work page 2024
-
[39]
Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William ...
work page 2024
-
[40]
OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...
work page 2024
-
[41]
Yanzhou Pan, Huawei Lin, Yide Ran, Jiamin Chen, Xiaodong Yu, Weijie Zhao, Denghui Zhang, and Zhaozhuo Xu. Alinfik: Learning to approximate linearized future influence kernel for scalable third-party LLM data valuation. In Luis Chiruzzo, Alan Ritter, and Lu Wang (eds.), Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Associ...
work page 2025
-
[42]
In-context unlearning: Language models as few shot unlearners, 2024
Martin Pawelczyk, Seth Neel, and Himabindu Lakkaraju. In-context unlearning: Language models as few shot unlearners, 2024
work page 2024
-
[43]
Estimating training data influence by tracing gradient descent
Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.\ 19920--19930. Curran Associates, Inc., 2020
work page 2020
-
[44]
Direct preference optimization: Your language model is secretly a reward model
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (eds.), Advances in Neural Information Processing Systems, pp.\ 53728--53741. Curran Associates, Inc., 2023
work page 2023
-
[45]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the predictions of any classifier, 2016
work page 2016
-
[46]
S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Bruce W. Croft and C. J. van Rijsbergen (eds.), SIGIR '94, pp.\ 232--241, London, 1994. Springer London. ISBN 978-1-4471-2099-5
work page 1994
-
[47]
Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, and Chiyuan Zhang. Muse: Machine unlearning six-way evaluation for language models, 2024
work page 2024
-
[48]
Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander...
work page 2024
-
[49]
Axiomatic attribution for deep networks, 2017
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks, 2017
work page 2017
-
[50]
Improvements to bm25 and language models examined
Andrew Trotman, Antti Puurula, and Blake Burgess. Improvements to bm25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium, ADCS '14, pp.\ 58–65, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450330008
work page 2014
-
[51]
Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, and Bing Qin. Rkld: Reverse kl-divergence-based knowledge distillation for unlearning personal information in large language models, 2024
work page 2024
-
[52]
Influential training data retrieval for explaining verbalized confidence of llms, 2026
Yuxi Xia, Loris Schoenegger, and Benjamin Roth. Influential training data retrieval for explaining verbalized confidence of llms, 2026
work page 2026
-
[53]
Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, and Philip S. Yu. Machine unlearning: A survey. ACM Comput. Surv., 56 0 (1), August 2023. ISSN 0360-0300
work page 2023
-
[54]
Machine unlearning of pre-trained large language models, 2024
Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, and Xiang Yue. Machine unlearning of pre-trained large language models, 2024
work page 2024
-
[55]
Chih-Kuan Yeh, Joon Sik Kim, Ian E. H. Yen, and Pradeep Ravikumar. Representer point selection for explaining deep neural networks, 2018
work page 2018
-
[56]
Gradient ascent post-training enhances language model generalization, 2023
Dongkeun Yoon, Joel Jang, Sungdong Kim, and Minjoon Seo. Gradient ascent post-training enhances language model generalization, 2023
work page 2023
-
[57]
Negative preference optimization: From catastrophic collapse to effective unlearning, 2024
Ruiqi Zhang, Licong Lin, Yu Bai, and Song Mei. Negative preference optimization: From catastrophic collapse to effective unlearning, 2024
work page 2024
-
[58]
Decoupling the class label and the target concept in machine unlearning, 2024
Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, and Masashi Sugiyama. Decoupling the class label and the target concept in machine unlearning, 2024
work page 2024
-
[59]
E. Zitzler and L. Thiele. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, 3 0 (4): 0 257--271, 1999
work page 1999
-
[60]
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[61]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[62]
Robertson, S. E. and Walker, S
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.