Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens

Junmo Kim; Seunghee Koh; Sunghyun Baek; Youngdong Kim

arxiv: 2604.17785 · v1 · submitted 2026-04-20 · 💻 cs.CL · cs.AI· cs.LG

Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens

Seunghee Koh , Sunghyun Baek , Youngdong Kim , Junmo Kim This is my paper

Pith reviewed 2026-05-10 04:28 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords unlearninglarge language modelsentropytoken weightingselective forgettingmodel utilitypredictive distributionLLM safety

0 comments

The pith

Weighting unlearning loss by token entropy lets models forget key facts while preserving general capabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Uniform application of forgetting losses in language models often damages useful abilities along with the targeted content. The paper notes that semantically informative tokens tend to produce higher entropy in the model's next-token predictions, while predictable structural tokens show lower entropy. Entropy-guided Token Weighting scales the unlearning loss upward for high-entropy positions using only the model's internal output distribution. This selective focus removes unwanted knowledge or behaviors more effectively than uniform or parser-based alternatives. As a result, downstream task performance degrades less than with prior token-level unlearning methods.

Core claim

The authors propose Entropy-guided Token Weighting (ETW) as a token-level regularizer for unlearning in LLMs. ETW multiplies the contribution of each token to the forgetting loss by the entropy of the model's predictive distribution at that position. They establish that informative tokens carry higher entropy due to greater predictive uncertainty, whereas structural tokens remain low-entropy because they are highly predictable. This weighting produces stronger removal of the desired targets alongside reduced loss of model utility on unrelated tasks compared with uniform loss or other token-wise baselines.

What carries the argument

Entropy-guided Token Weighting (ETW), a regularizer that scales each token's unlearning loss by the entropy of the model's softmax distribution over next tokens.

If this is right

Unlearning focuses computational effort on uncertain positions rather than altering all tokens equally.
Low-entropy structural tokens experience smaller updates, limiting damage to syntax and fluency.
The approach needs no external parsers, ground-truth confidence scores, or additional labels.
Targeted forgetting becomes possible with smaller overall changes to the model's parameter space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The entropy proxy could extend to other selective-update settings such as editing factual knowledge or mitigating bias without full retraining.
Sequence-aware variants of the weighting might better handle facts that span multiple tokens.
Direct comparison of model entropy against human annotations of token importance would test how well the proxy generalizes across domains.

Load-bearing premise

The entropy of the model's next-token predictions reliably marks which tokens carry the semantic content or behavior that must be removed.

What would settle it

Perform ETW unlearning on a model for specific facts or behaviors, then measure both the continued generation rate of the forgotten material and accuracy on standard benchmarks such as MMLU; if ETW produces weaker forgetting or larger utility drops than uniform baselines, the central benefit does not hold.

Figures

Figures reproduced from arXiv: 2604.17785 by Junmo Kim, Seunghee Koh, Sunghyun Baek, Youngdong Kim.

**Figure 2.** Figure 2: ROC–AUC curves for distinguishing informa [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Token-wise histograms for informative and structural tokens. We compare ETW with other weighting [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Token-wise visualization on TOFU forget samples, highlighting informative annotations and forget loss [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Aggregated score (Agg.) and privacy leakage [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Trade-off between model utility and forget quality on TOFU. Larger markers indicate the best configuration [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Token-wise histogram for informative and [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Forget quality (− log(FQ)) and model utility degradation (∆MU) across different temperatures and forget splits for LLaMA 3.2-1B and 3B models. The configuration used in [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Epoch-wise token regularization under TOFU 10% forget split. Token-level weights of ETW, SatImp, [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

read the original abstract

Unlearning in large language models (LLMs) has emerged as a promising safeguard against adversarial behaviors. When the forgetting loss is applied uniformly without considering token-level semantic importance, model utility can be unnecessarily degraded. Recent studies have explored token-wise loss regularizers that prioritize informative tokens, but largely rely on ground-truth confidence or external linguistic parsers, which limits their ability to capture contextual information or the model's overall predictive state. Intuitively, function words like "the" primarily serve syntactic roles and are highly predictable with little ambiguity, but informative words admit multiple plausible alternatives with greater uncertainty. Based on this intuition, we propose Entropy-guided Token Weighting (ETW), a token-level unlearning regularizer that uses entropy of the predictive distribution as a proxy for token informativeness. We demonstrate that informative tokens tend to have higher entropy, whereas structural tokens tend to have lower entropy. This behavior enables ETW to achieve more effective unlearning while better preserving model utility than existing token-level approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes weighting unlearning loss by the model's own predictive entropy to focus on informative tokens, which is a self-contained idea but rests on unshown experiments.

read the letter

The one or two things to know are that the paper suggests using the entropy of an LLM's next-token predictions to weight which tokens get more emphasis during unlearning, and that this is positioned as better than prior token-level methods because it doesn't need external parsers or ground-truth labels. The new element is the entropy proxy itself. Earlier work on selective unlearning at the token level often depended on things like parser outputs or confidence from labeled data. Here the weighting comes straight from the model's predictive distribution, which makes the approach self-contained and potentially easier to apply across different models and tasks. The paper does a solid job explaining the motivation. It points out that structural tokens like function words are usually low-entropy because they are predictable, while informative tokens have higher entropy due to more possible continuations. This leads to a regularizer that applies stronger forgetting pressure where it matters and lighter pressure elsewhere to protect overall utility. Where it is softer is the lack of supporting data in what we have. The abstract claims that informative tokens show higher entropy and that the method achieves more effective unlearning with better utility preservation, but without any quantitative results, ablation studies, or error bars, those claims stay untested. The key assumption—that entropy serves as a reliable indicator of semantic informativeness for unlearning purposes—needs empirical backing to show it doesn't create unexpected failure modes on downstream tasks or in specific scenarios. Readers who work on LLM safety, model editing, or unlearning algorithms would find this relevant. Someone looking for lightweight improvements to existing regularizers could pick up the method and test it quickly. The paper deserves a serious referee. The logic is clear and the proposal is easy to reproduce, so reviewers can focus on whether the experiments confirm the intuition and deliver the promised trade-off.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Entropy-guided Token Weighting (ETW), a token-level regularizer for LLM unlearning that weights the forgetting loss by the entropy of the model's own next-token predictive distribution. The central intuition is that high-entropy tokens are semantically informative while low-entropy tokens are structural/function words; ETW therefore applies stronger unlearning pressure to the former. The authors claim this yields superior unlearning effectiveness and utility preservation compared with prior token-wise methods that rely on ground-truth confidence or external parsers.

Significance. If the empirical claims hold, ETW supplies a simple, fully model-intrinsic, parameter-free mechanism for selective unlearning that avoids external linguistic tools. This could meaningfully improve the practicality of unlearning safeguards in deployed LLMs.

major comments (2)

[Abstract, §3] Abstract and §3: the central claim that 'informative tokens tend to have higher entropy' and that ETW therefore produces better unlearning-utility trade-offs is asserted without any reported correlation statistics, ablation tables, or quantitative comparison to baselines. The soundness assessment cannot be completed until the experimental sections supply these data with error bars and statistical tests.
[§4] §4 (method): the weighting formula w_t = H(p(·|x_<t)) is presented as directly following from the entropy-informativeness intuition, yet no derivation or sensitivity analysis shows why this particular functional form is preferred over alternatives (e.g., normalized entropy, mutual information, or variance of the predictive distribution).

minor comments (2)

[§4] Notation: the manuscript should explicitly define whether entropy is computed over the full vocabulary or a top-k subset, and whether temperature scaling is applied before entropy calculation.
[§2] Related work: the comparison to prior token-level unlearning regularizers would benefit from a concise table summarizing their information sources (ground-truth, parser, model entropy).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major point below and indicate the revisions we will make to improve the manuscript's rigor.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3: the central claim that 'informative tokens tend to have higher entropy' and that ETW therefore produces better unlearning-utility trade-offs is asserted without any reported correlation statistics, ablation tables, or quantitative comparison to baselines. The soundness assessment cannot be completed until the experimental sections supply these data with error bars and statistical tests.

Authors: We agree that explicit quantitative support strengthens the central claim. The experimental section (§5) already reports unlearning-utility trade-offs against token-level baselines, but we did not include direct correlation statistics (e.g., between entropy and token informativeness proxies) or statistical significance tests. In revision we will add these: a correlation analysis (Pearson coefficients with error bars), ablation tables, and t-test p-values across multiple random seeds. revision: yes
Referee: [§4] §4 (method): the weighting formula w_t = H(p(·|x_<t)) is presented as directly following from the entropy-informativeness intuition, yet no derivation or sensitivity analysis shows why this particular functional form is preferred over alternatives (e.g., normalized entropy, mutual information, or variance of the predictive distribution).

Authors: The raw entropy form follows directly from the definition of predictive uncertainty without introducing scale-dependent normalizers. We acknowledge the absence of sensitivity analysis. We will revise §4 with a short justification paragraph and add an ablation study in the experiments comparing raw entropy against normalized entropy, predictive variance, and mutual information, reporting the resulting unlearning-utility curves. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper's central construction is Entropy-guided Token Weighting (ETW), which directly computes per-token weights from the entropy of the model's own next-token predictive distribution. This is an explicit, non-fitted proxy derived from the forward pass rather than from any target unlearning labels or fitted parameters. The abstract states the supporting intuition and the empirical demonstration that informative tokens exhibit higher entropy, but presents neither as a mathematical derivation that reduces to the inputs by construction. No equations, self-citations, uniqueness theorems, or ansatzes are referenced in the provided text that would create a load-bearing loop. The method therefore remains falsifiable via downstream unlearning-utility trade-offs and does not collapse into a renaming or self-definition of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven assumption that predictive entropy correlates with token informativeness and that this correlation can be exploited for selective unlearning without side effects.

axioms (1)

domain assumption Entropy of the predictive distribution serves as a reliable proxy for token semantic informativeness.
Stated in the abstract as the basis for ETW; no external validation or derivation is provided.

pith-pipeline@v0.9.0 · 5477 in / 1017 out tokens · 27525 ms · 2026-05-10T04:28:01.695266+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

2024 , url=

Pratyush Maini and Zhili Feng and Avi Schwarzschild and Zachary Chase Lipton and J Zico Kolter , booktitle=. 2024 , url=

work page 2024
[2]

Smith and Chiyuan Zhang , booktitle=

Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang , booktitle=. 2025 , url=

work page 2025
[3]

and Jia, Hengrui and Travers, Adelin and Zhang, Baiwu and Lie, David and Papernot, Nicolas , booktitle=

Bourtoule, Lucas and Chandrasekaran, Varun and Choquette-Choo, Christopher A. and Jia, Hengrui and Travers, Adelin and Zhang, Baiwu and Lie, David and Papernot, Nicolas , booktitle=. Machine Unlearning , year=

work page
[4]

CVPR , year =

Golatkar, Aditya and Achille, Alessandro and Ravichandran, Avinash and Polito, Marzia and Soatto, Stefano , title =. CVPR , year =

work page
[5]

CVPR , year=

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , author=. CVPR , year=

work page
[6]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Koh, Seunghee and Shon, Hyounguk and Lee, Janghyeon and Hong, Hyeong Gwon and Kim, Junmo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2023 , pages =

work page 2023
[7]

Rethinking

Qizhou Wang and Jin Peng Zhou and Zhanke Zhou and Saebyeol Shin and Bo Han and Kilian Q Weinberger , booktitle=. Rethinking. 2025 , url=

work page 2025
[8]

UNDIAL : Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models

Dong, Yijiang River and Lin, Hongzhou and Belkin, Mikhail and Huerta, Ramon and Vuli \'c , Ivan. UNDIAL : Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: L...

work page doi:10.18653/v1/2025.naacl-long.444 2025
[9]

Towards Robust and Parameter-Efficient Knowledge Unlearning for

Sungmin Cha and Sungjun Cho and Dasol Hwang and Moontae Lee , booktitle=. Towards Robust and Parameter-Efficient Knowledge Unlearning for. 2025 , url=

work page 2025
[10]

The Thirteenth International Conference on Learning Representations , year=

Min-K\ author=. The Thirteenth International Conference on Learning Representations , year=

work page
[11]

Towards Understanding Jailbreak Attacks in LLM s: A Representation Space Analysis

Lin, Yuping and He, Pengfei and Xu, Han and Xing, Yue and Yamada, Makoto and Liu, Hui and Tang, Jiliang. Towards Understanding Jailbreak Attacks in LLM s: A Representation Space Analysis. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.401

work page doi:10.18653/v1/2024.emnlp-main.401 2024
[12]

The Twelfth International Conference on Learning Representations , year=

Curiosity-driven Red-teaming for Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

work page
[13]

2020 , rul =

spaCy: Industrial-strength Natural Language Processing in Python , author =. 2020 , rul =

work page 2020
[14]

Li, Nathaniel and Pan, Alexander and Gopal, Anjali and Yue, Summer and Berrios, Daniel and Gatti, Alice and Li, Justin D. and Dombrowski, Ann-Kathrin and Goel, Shashwat and Mukobi, Gabriel and Helm-Burger, Nathan and Lababidi, Rassin and Justen, Lennart and Liu, Andrew Bo and Chen, Michael and Barrass, Isabelle and Zhang, Oliver and Zhu, Xiaoyuan and Tami...

work page 2024
[15]

First Conference on Language Modeling , year=

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning , author=. First Conference on Language Modeling , year=

work page
[16]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[17]

2025 , url=

Dorna, Vineeth and Mekala, Anmol and Zhao, Wenlong and McCallum, Andrew and Lipton, Zachary C and Kolter, J Zico and Maini, Pratyush , journal=. 2025 , url=

work page 2025
[18]

Program , volume =

An Algorithm for Suffix Stripping , author =. Program , volume =. 1980 , doi =

work page 1980
[19]

Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit , author =

work page
[20]

Exploring Criteria of Loss Reweighting to Enhance

Puning Yang and Qizhou Wang and Zhuo Huang and Tongliang Liu and Chengqi Zhang and Bo Han , booktitle=. Exploring Criteria of Loss Reweighting to Enhance. 2025 , url=

work page 2025
[21]

Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models

Mekala, Anmol and Dorna, Vineeth and Dubey, Shreya and Lalwani, Abhishek and Koleczek, David and Rungta, Mukund and Hasan, Sadid and Lobo, Elita. Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025
[22]

Simplicity Prevails: Rethinking Negative Preference Optimization for

Chongyu Fan and Jiancheng Liu and Licong Lin and Jinghan Jia and Ruiqi Zhang and Song Mei and Sijia Liu , booktitle=. Simplicity Prevails: Rethinking Negative Preference Optimization for. 2024 , url=

work page 2024
[23]

The Twelfth International Conference on Learning Representations , year=

Detecting Pretraining Data from Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

work page
[24]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

work page 2004
[25]

2025 , eprint=

Evaluating LLaMA 3.2 for Software Vulnerability Detection , author=. 2025 , eprint=

work page 2025
[26]

StableLM Zephyr 3B , author =

work page
[27]

International Conference on Learning Representations , year=

Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=

work page
[28]

Reasoning with exploration: An entropy perspective

Reasoning with Exploration: An Entropy Perspective , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , month=. doi:10.1609/aaai.v40i36.40290 , abstractNote=

work page doi:10.1609/aaai.v40i36.40290 2026
[29]

Token Cleaning: Fine-Grained Data Selection for

Jinlong Pang and Na Di and Zhaowei Zhu and Jiaheng Wei and Hao Cheng and Chen Qian and Yang Liu , booktitle=. Token Cleaning: Fine-Grained Data Selection for. 2025 , url=

work page 2025
[30]

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for

Shenzhi Wang and Le Yu and Chang Gao and Chujie Zheng and Shixuan Liu and Rui Lu and Kai Dang and Xiong-Hui Chen and Jianxin Yang and Zhenru Zhang and Yuqiong Liu and An Yang and Andrew Zhao and Yang Yue and Shiji Song and Bowen Yu and Gao Huang and Junyang Lin , booktitle=. Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement...

work page 2025
[31]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and. Nature Methods , year =

work page

[1] [1]

2024 , url=

Pratyush Maini and Zhili Feng and Avi Schwarzschild and Zachary Chase Lipton and J Zico Kolter , booktitle=. 2024 , url=

work page 2024

[2] [2]

Smith and Chiyuan Zhang , booktitle=

Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang , booktitle=. 2025 , url=

work page 2025

[3] [3]

and Jia, Hengrui and Travers, Adelin and Zhang, Baiwu and Lie, David and Papernot, Nicolas , booktitle=

Bourtoule, Lucas and Chandrasekaran, Varun and Choquette-Choo, Christopher A. and Jia, Hengrui and Travers, Adelin and Zhang, Baiwu and Lie, David and Papernot, Nicolas , booktitle=. Machine Unlearning , year=

work page

[4] [4]

CVPR , year =

Golatkar, Aditya and Achille, Alessandro and Ravichandran, Avinash and Polito, Marzia and Soatto, Stefano , title =. CVPR , year =

work page

[5] [5]

CVPR , year=

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , author=. CVPR , year=

work page

[6] [6]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Koh, Seunghee and Shon, Hyounguk and Lee, Janghyeon and Hong, Hyeong Gwon and Kim, Junmo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2023 , pages =

work page 2023

[7] [7]

Rethinking

Qizhou Wang and Jin Peng Zhou and Zhanke Zhou and Saebyeol Shin and Bo Han and Kilian Q Weinberger , booktitle=. Rethinking. 2025 , url=

work page 2025

[8] [8]

UNDIAL : Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models

Dong, Yijiang River and Lin, Hongzhou and Belkin, Mikhail and Huerta, Ramon and Vuli \'c , Ivan. UNDIAL : Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: L...

work page doi:10.18653/v1/2025.naacl-long.444 2025

[9] [9]

Towards Robust and Parameter-Efficient Knowledge Unlearning for

Sungmin Cha and Sungjun Cho and Dasol Hwang and Moontae Lee , booktitle=. Towards Robust and Parameter-Efficient Knowledge Unlearning for. 2025 , url=

work page 2025

[10] [10]

The Thirteenth International Conference on Learning Representations , year=

Min-K\ author=. The Thirteenth International Conference on Learning Representations , year=

work page

[11] [11]

Towards Understanding Jailbreak Attacks in LLM s: A Representation Space Analysis

Lin, Yuping and He, Pengfei and Xu, Han and Xing, Yue and Yamada, Makoto and Liu, Hui and Tang, Jiliang. Towards Understanding Jailbreak Attacks in LLM s: A Representation Space Analysis. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.401

work page doi:10.18653/v1/2024.emnlp-main.401 2024

[12] [12]

The Twelfth International Conference on Learning Representations , year=

Curiosity-driven Red-teaming for Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

work page

[13] [13]

2020 , rul =

spaCy: Industrial-strength Natural Language Processing in Python , author =. 2020 , rul =

work page 2020

[14] [14]

Li, Nathaniel and Pan, Alexander and Gopal, Anjali and Yue, Summer and Berrios, Daniel and Gatti, Alice and Li, Justin D. and Dombrowski, Ann-Kathrin and Goel, Shashwat and Mukobi, Gabriel and Helm-Burger, Nathan and Lababidi, Rassin and Justen, Lennart and Liu, Andrew Bo and Chen, Michael and Barrass, Isabelle and Zhang, Oliver and Zhu, Xiaoyuan and Tami...

work page 2024

[15] [15]

First Conference on Language Modeling , year=

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning , author=. First Conference on Language Modeling , year=

work page

[16] [16]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page

[17] [17]

2025 , url=

Dorna, Vineeth and Mekala, Anmol and Zhao, Wenlong and McCallum, Andrew and Lipton, Zachary C and Kolter, J Zico and Maini, Pratyush , journal=. 2025 , url=

work page 2025

[18] [18]

Program , volume =

An Algorithm for Suffix Stripping , author =. Program , volume =. 1980 , doi =

work page 1980

[19] [19]

Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit , author =

work page

[20] [20]

Exploring Criteria of Loss Reweighting to Enhance

Puning Yang and Qizhou Wang and Zhuo Huang and Tongliang Liu and Chengqi Zhang and Bo Han , booktitle=. Exploring Criteria of Loss Reweighting to Enhance. 2025 , url=

work page 2025

[21] [21]

Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models

Mekala, Anmol and Dorna, Vineeth and Dubey, Shreya and Lalwani, Abhishek and Koleczek, David and Rungta, Mukund and Hasan, Sadid and Lobo, Elita. Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025

[22] [22]

Simplicity Prevails: Rethinking Negative Preference Optimization for

Chongyu Fan and Jiancheng Liu and Licong Lin and Jinghan Jia and Ruiqi Zhang and Song Mei and Sijia Liu , booktitle=. Simplicity Prevails: Rethinking Negative Preference Optimization for. 2024 , url=

work page 2024

[23] [23]

The Twelfth International Conference on Learning Representations , year=

Detecting Pretraining Data from Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

work page

[24] [24]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

work page 2004

[25] [25]

2025 , eprint=

Evaluating LLaMA 3.2 for Software Vulnerability Detection , author=. 2025 , eprint=

work page 2025

[26] [26]

StableLM Zephyr 3B , author =

work page

[27] [27]

International Conference on Learning Representations , year=

Measuring Massive Multitask Language Understanding , author=. International Conference on Learning Representations , year=

work page

[28] [28]

Reasoning with exploration: An entropy perspective

Reasoning with Exploration: An Entropy Perspective , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , month=. doi:10.1609/aaai.v40i36.40290 , abstractNote=

work page doi:10.1609/aaai.v40i36.40290 2026

[29] [29]

Token Cleaning: Fine-Grained Data Selection for

Jinlong Pang and Na Di and Zhaowei Zhu and Jiaheng Wei and Hao Cheng and Chen Qian and Yang Liu , booktitle=. Token Cleaning: Fine-Grained Data Selection for. 2025 , url=

work page 2025

[30] [30]

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for

Shenzhi Wang and Le Yu and Chang Gao and Chujie Zheng and Shixuan Liu and Rui Lu and Kai Dang and Xiong-Hui Chen and Jianxin Yang and Zhenru Zhang and Yuqiong Liu and An Yang and Andrew Zhao and Yang Yue and Shiji Song and Bowen Yu and Gao Huang and Junyang Lin , booktitle=. Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement...

work page 2025

[31] [31]

and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and

Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and. Nature Methods , year =

work page