arxiv: 2605.00195 · v2 · submitted 2026-04-30 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Diversity in Large Language Models under Supervised Fine-Tuning

Roman Klypa , Oleksandr Cherednichenko

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:59 UTC · model grok-4.3

classification 💻 cs.LG

keywords diversitysupervised fine-tuninglarge language modelsTOFU lossgenerative diversityfine-tuningalignmentloss function

0 comments

The pith

Tempered Focal loss improves diversity in fine-tuned large language models without sacrificing quality

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Supervised fine-tuning of large language models is known to reduce the variety in their generated outputs. This paper identifies two main causes for this reduction: the fine-tuning data overlooks rare patterns, and the models forget some of their original knowledge. To fix this, the authors introduce the Tempered Focal loss, which balances attention to frequent and infrequent patterns while retaining prior knowledge. Tests on multiple models show that this approach increases diversity in responses while keeping them accurate and high-quality, providing a better way to align models with user needs.

Core claim

The paper claims that the observed decline in generative diversity after supervised fine-tuning stems from neglect of low-frequency patterns in the fine-tuning datasets and forgetting of preexisting knowledge. Motivated by this, they propose the Tempered Focal (TOFU) loss function that addresses both issues at once. Extensive experiments confirm that TOFU enhances output diversity across various models and benchmarks while preserving response quality.

What carries the argument

The Tempered Focal (TOFU) loss, a novel objective function designed to simultaneously counteract the neglect of low-frequency patterns and the forgetting of preexisting knowledge during supervised fine-tuning.

If this is right

TOFU offers a principled method for performing SFT that maintains generative breadth.
Models trained with TOFU produce more varied responses on standard benchmarks.
The method scales to different model sizes and tasks.
High response quality is retained alongside increased diversity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar loss modifications could be explored for other alignment methods like preference tuning.
Future work might test if TOFU helps in reducing hallucinations by preserving diverse knowledge.
Applying this to multimodal models could enhance creative generation in images or text.

Load-bearing premise

The decline in diversity after SFT is primarily driven by neglect of low-frequency patterns in fine-tuning datasets and forgetting of preexisting knowledge.

What would settle it

Measuring output diversity on a benchmark before and after applying TOFU; if diversity does not increase or quality drops significantly, the effectiveness of TOFU would be challenged.

Figures

Figures reproduced from arXiv: 2605.00195 by Oleksandr Cherednichenko, Roman Klypa.

**Figure 1.** Figure 1: Impact of SFT on Generative Diversity. Comparison of data distributions and model states: Base Data represents the broad pretraining corpus, SFT Data represents the curated instruction set. The bars illustrate the discrete probability distribution over the vocabulary, color denotes token category. (A) Ignorance illustrates the failure to capture low-frequency SFT patterns, while (B) Forgetting depicts the … view at source ↗

**Figure 2.** Figure 2: Gradient scaling amplitudes for Focal Loss and λ-PR as functions of various parameters and probabilities p. For λ-PR we omit the dependence on tokens position for simplicity. overemphasis on hard examples. Ideally, one would like to scale the GEM gradients using the focal scaling function g(p, γ) evaluated on the unscaled probabilities p. Simply multiplying the temperature-scaled Cross-Entropy loss by a fo… view at source ↗

**Figure 4.** Figure 4: Evaluation results of fine-tuning methods for the ARC and MMLU benchmarks averaged across the tested models. The values represent the mean accuracy scores (0-1) across all tasks in each benchmark. The dotted line serves as a reference point for CE performance. ception is λ-PR, which exhibits a consistent performance degradation. In [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Evaluation results of fine-tuning methods for the Malicious Instruct benchmark averaged across the safe (security aligned) and unsafe (base) models. The values represent the mean ASR scores (0-1) across all tasks. Lower scores indicate safer models. Error bars (standard deviation) are omitted to maintain visual clarity across multiple model comparisons. The dotted line serves as a reference point for CE p… view at source ↗

read the original abstract

Supervised Fine-Tuning (SFT) is essential for aligning Large Language Models (LLMs) with user intent, yet it is believed to suppress generative diversity. Although this reduction is frequently referenced, formal empirical testing of the phenomenon remains limited. The expressiveness of LLMs by itself was addressed by multiple prior methods. Their varying perspectives suggest that deeper investigation could yield further improvements. In this study, we attribute the decline to two primary drivers: the neglect of low-frequency patterns within fine-tuning datasets and the forgetting of preexisting knowledge. Motivated by our theoretical analysis, we develop Tempered Focal (TOFU) loss, a novel objective that addresses both stated challenges simultaneously. Our extensive evaluation confirms at scale that generation breadth narrows after SFT and strengthens the hypothesis explaining this effect. Across multiple models and benchmarks, we demonstrate that TOFU enhances output diversity while preserving high response quality, offering a principled approach to SFT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that supervised fine-tuning (SFT) reduces generative diversity in LLMs due to two primary drivers: neglect of low-frequency patterns in fine-tuning datasets and forgetting of preexisting knowledge. Motivated by theoretical analysis, the authors introduce the Tempered Focal (TOFU) loss to address both issues simultaneously. Experiments across multiple models and benchmarks show that SFT narrows generation breadth, while TOFU enhances output diversity metrics without sacrificing response quality, framing TOFU as a principled SFT objective.

Significance. If the attribution to the two mechanisms is substantiated and the improvements are shown to arise specifically from TOFU's design rather than generic regularization, the work would be significant for LLM alignment research. Maintaining diversity during SFT is a practical concern for applications needing varied outputs, and a theoretically motivated loss could inform standard fine-tuning practices if the results prove robust and reproducible.

major comments (2)

[Abstract and theoretical analysis] The central claim that diversity decline is driven by neglect of low-frequency patterns and forgetting of preexisting knowledge (Abstract) lacks direct empirical support in the described experiments, such as token-frequency histograms of generations versus pretraining data or knowledge-probing accuracy before/after SFT. Without these measurements or ablations isolating the mechanisms, the link between the drivers and observed effects remains unverified, weakening the 'principled approach' framing for TOFU.
[Experimental evaluation] The experimental evaluation (described in the abstract as 'extensive evaluation at scale') demonstrates TOFU improves diversity while preserving quality, but does not include ablations comparing TOFU to standard focal loss, other regularizers, or variants that address only one driver. This makes it unclear whether gains target the stated challenges or arise incidentally, undermining the causal attribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to provide additional empirical support and ablations as outlined.

read point-by-point responses

Referee: The central claim that diversity decline is driven by neglect of low-frequency patterns and forgetting of preexisting knowledge (Abstract) lacks direct empirical support in the described experiments, such as token-frequency histograms of generations versus pretraining data or knowledge-probing accuracy before/after SFT. Without these measurements or ablations isolating the mechanisms, the link between the drivers and observed effects remains unverified, weakening the 'principled approach' framing for TOFU.

Authors: We appreciate the referee's emphasis on direct empirical validation. Our theoretical analysis in Section 3 derives the two mechanisms as primary drivers, and the experiments confirm both the post-SFT diversity narrowing and TOFU's benefits at scale. To strengthen the attribution, we will add token-frequency histograms of generations versus pretraining data and knowledge-probing accuracy measurements before and after SFT in the revised manuscript. These will provide the requested direct support for the mechanisms. revision: yes
Referee: The experimental evaluation (described in the abstract as 'extensive evaluation at scale') demonstrates TOFU improves diversity while preserving quality, but does not include ablations comparing TOFU to standard focal loss, other regularizers, or variants that address only one driver. This makes it unclear whether gains target the stated challenges or arise incidentally, undermining the causal attribution.

Authors: We agree that isolating the contributions of TOFU's components is important for causal claims. We will incorporate ablations in the revised manuscript, including direct comparisons to standard focal loss, other regularizers (e.g., label smoothing), and TOFU variants that address only one driver (e.g., focal-only or tempering-only). These will demonstrate that the observed diversity improvements specifically arise from simultaneously targeting both low-frequency neglect and knowledge forgetting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper's central claims rest on a stated theoretical analysis identifying two drivers of diversity decline (neglect of low-frequency patterns and forgetting) followed by empirical validation of the TOFU loss. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains are present in the abstract or described structure that would reduce any result to its inputs by construction. The approach is motivated by analysis but evaluated independently via benchmarks, making the derivation self-contained against external measurements rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only; limited visibility into parameters or assumptions. The central hypothesis about two drivers of diversity loss is treated as a domain assumption without independent verification here.

axioms (1)

domain assumption SFT suppresses generative diversity primarily via neglect of low-frequency patterns and forgetting of preexisting knowledge
Directly stated as the attribution motivating TOFU.

invented entities (1)

Tempered Focal (TOFU) loss no independent evidence
purpose: Simultaneously addresses low-frequency neglect and knowledge forgetting during SFT
Newly proposed training objective.

pith-pipeline@v0.9.0 · 5457 in / 1108 out tokens · 40851 ms · 2026-05-12T02:59:01.045928+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We attribute the decline to two primary drivers: the neglect of low-frequency patterns within fine-tuning datasets and the forgetting of preexisting knowledge. ... Tempered Focal (TOFU) loss
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.1 (GEM loss equivalence) ... ∇θLGEM(θ) = ∇θLβ_CE(θ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 39 internal anchors

[1]

Calibrating

Ulmer, Dennis and Gubri, Martin and Lee, Hwaran and Yun, Sangdoo and Oh, Seong Joon , year =. Calibrating. doi:10.48550/ARXIV.2403.05973 , abstract =

work page doi:10.48550/arxiv.2403.05973
[2]

Wang, Chaoqi and Jiang, Yibo and Yang, Chenghao and Liu, Han and Chen, Yuxin , year =. Beyond. doi:10.48550/ARXIV.2309.16240 , abstract =

work page doi:10.48550/arxiv.2309.16240
[3]

Revisiting the

Minderer, Matthias and Djolonga, Josip and Romijnders, Rob and Hubis, Frances and Zhai, Xiaohua and Houlsby, Neil and Tran, Dustin and Lucic, Mario , month = oct, year =. Revisiting the. doi:10.48550/arXiv.2106.07998 , abstract =

work page doi:10.48550/arxiv.2106.07998
[4]

On calibration of modern neural networks,

Guo, Chuan and Pleiss, Geoff and Sun, Yu and Weinberger, Kilian Q. , month = aug, year =. On. doi:10.48550/arXiv.1706.04599 , abstract =

work page doi:10.48550/arxiv.1706.04599
[5]

Improving

Verine, Alexandre and Bronnec, Florian Le and Zheng, Kunhao and Allauzen, Alexandre and Chevaleyre, Yann and Negrevergne, Benjamin , month = aug, year =. Improving. doi:10.48550/arXiv.2508.09654 , abstract =

work page doi:10.48550/arxiv.2508.09654
[6]

Zhu, Chiwei and Xu, Benfeng and Wang, Quan and Zhang, Yongdong and Mao, Zhendong , year =. On the. doi:10.48550/ARXIV.2311.13240 , abstract =

work page doi:10.48550/arxiv.2311.13240
[7]

Preserving diversity in supervised fine-tuning of large language models, 2025

Li, Ziniu and Chen, Congliang and Xu, Tian and Qin, Zeyu and Xiao, Jiancong and Luo, Zhi-Quan and Sun, Ruoyu , year =. Preserving. doi:10.48550/ARXIV.2408.16673 , abstract =

work page doi:10.48550/arxiv.2408.16673
[8]

and Shen, Li , year =

Xiao, Jiancong and Hou, Bojian and Wang, Zhanliang and Jin, Ruochen and Long, Qi and Su, Weijie J. and Shen, Li , year =. Restoring. doi:10.48550/ARXIV.2505.01997 , abstract =

work page doi:10.48550/arxiv.2505.01997
[9]

Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Dollár, Piotr , year =. Focal. doi:10.48550/ARXIV.1708.02002 , abstract =

work page Pith review doi:10.48550/arxiv.1708.02002
[10]

Mukhoti, Jishnu and Kulharia, Viveka and Sanyal, Amartya and Golodetz, Stuart and Torr, Philip H. S. and Dokania, Puneet K. , year =. Calibrating. doi:10.48550/ARXIV.2002.09437 , abstract =

work page doi:10.48550/arxiv.2002.09437 2002
[11]

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Lin, Stephanie and Hilton, Jacob and Evans, Owain , year =. doi:10.48550/ARXIV.2109.07958 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2109.07958
[12]

Ultrafeedback: Boosting language models with scaled ai feedback.arXiv preprint arXiv:2310.01377,

Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and He, Bingxiang and Zhu, Wei and Ni, Yuan and Xie, Guotong and Xie, Ruobing and Lin, Yankai and Liu, Zhiyuan and Sun, Maosong , year =. doi:10.48550/ARXIV.2310.01377 , abstract =

work page doi:10.48550/arxiv.2310.01377
[13]

Precision-

Verine, Alexandre and Negrevergne, Benjamin and Pydi, Muni Sreenivas and Chevaleyre, Yann , year =. Precision-. doi:10.48550/ARXIV.2305.18910 , abstract =

work page doi:10.48550/arxiv.2305.18910
[14]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Ermon, Stefano and Manning, Christopher D. and Finn, Chelsea , year =. Direct. doi:10.48550/ARXIV.2305.18290 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.18290
[15]

Noveltybench: Evaluating language models for humanlike diversity.arXiv preprint arXiv:2504.05228,

Zhang, Yiming and Diddee, Harshita and Holm, Susan and Liu, Hanchen and Liu, Xinyue and Samuel, Vinay and Wang, Barry and Ippolito, Daphne , year =. doi:10.48550/ARXIV.2504.05228 , abstract =

work page doi:10.48550/arxiv.2504.05228
[16]

Measuring Massive Multitask Language Understanding

Hendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, Jacob , year =. Measuring. doi:10.48550/ARXIV.2009.03300 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2009.03300 2009
[17]

Biderman, Stella and Schoelkopf, Hailey and Anthony, Quentin and Bradley, Herbie and O'Brien, Kyle and Hallahan, Eric and Khan, Mohammad Aflah and Purohit, Shivanshu and Prashanth, USVSN Sai and Raff, Edward and Skowron, Aviya and Sutawika, Lintang and van der Wal, Oskar , year =. Pythia:. doi:10.48550/ARXIV.2304.01373 , abstract =

work page doi:10.48550/arxiv.2304.01373
[18]

Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and Casas, Diego de las and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and Lavaud, Lélio Renard and Lachaux, Marie-Anne and Stock, Pierre and Scao, Teven Le and Lavril, Thibaut and Wang, Thomas and Lacroix, T...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825
[19]

OPT: Open Pre-trained Transformer Language Models

Zhang, Susan and Roller, Stephen and Goyal, Naman and Artetxe, Mikel and Chen, Moya and Chen, Shuohui and Dewan, Christopher and Diab, Mona and Li, Xian and Lin, Xi Victoria and Mihaylov, Todor and Ott, Myle and Shleifer, Sam and Shuster, Kurt and Simig, Daniel and Koura, Punit Singh and Sridhar, Anjali and Wang, Tianlu and Zettlemoyer, Luke , year =. doi...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.01068
[20]

Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and Zheng, Chujie and Liu, Dayiheng and Zhou, Fan and Huang, Fei and Hu, Feng and Ge, Hao and Wei, Haoran and Lin, Huan and Tang, Jialong and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jia...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388
[21]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma 2:. 2024 , keywords =. doi:10.48550/ARXIV.2408.00118 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2408.00118 2024
[22]

Mostafazadeh, Nasrin and Chambers, Nathanael and He, Xiaodong and Parikh, Devi and Batra, Dhruv and Vanderwende, Lucy and Kohli, Pushmeet and Allen, James , year =. A. doi:10.48550/ARXIV.1604.01696 , abstract =

work page doi:10.48550/arxiv.1604.01696
[23]

Charoenphakdee, Nontawat and Vongkulbhisal, Jayakorn and Chairatanakul, Nuttapong and Sugiyama, Masashi , year =. On. doi:10.48550/ARXIV.2011.09172 , abstract =

work page doi:10.48550/arxiv.2011.09172 2011
[24]

R., Kailkhura, B., Schwarzschild, A., Saha, A., et al

Jain, Neel and Chiang, Ping-yeh and Wen, Yuxin and Kirchenbauer, John and Chu, Hong-Min and Somepalli, Gowthami and Bartoldson, Brian R. and Kailkhura, Bhavya and Schwarzschild, Avi and Saha, Aniruddha and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom , month = oct, year =. doi:10.48550/arXiv.2310.05914 , abstract =

work page doi:10.48550/arxiv.2310.05914
[25]

Understanding

Hao, Yifan and Pan, Xingyuan and Zhang, Hanning and Ye, Chenlu and Pan, Rui and Zhang, Tong , month = oct, year =. Understanding. Proceedings of the 42nd

work page
[26]

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Goodfellow, Ian J. and Mirza, Mehdi and Xiao, Da and Courville, Aaron and Bengio, Yoshua , month = mar, year =. An. doi:10.48550/arXiv.1312.6211 , abstract =

work page Pith review doi:10.48550/arxiv.1312.6211
[27]

Attributing mode collapse in the fine-tuning of large language models , volume =

O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella , year =. Attributing mode collapse in the fine-tuning of large language models , volume =

work page
[28]

Understanding the effects of rlhf on llm generalisation and diversity.arXiv preprint arXiv:2310.06452,

Kirk, Robert and Mediratta, Ishita and Nalmpantis, Christoforos and Luketina, Jelena and Hambro, Eric and Grefenstette, Edward and Raileanu, Roberta , month = feb, year =. Understanding the. doi:10.48550/arXiv.2310.06452 , abstract =

work page doi:10.48550/arxiv.2310.06452
[29]

Maximum-

Dubey, Abhimanyu and Gupta, Otkrist and Raskar, Ramesh and Naik, Nikhil , month = sep, year =. Maximum-. doi:10.48550/arXiv.1809.05934 , abstract =

work page doi:10.48550/arxiv.1809.05934
[30]

Agarwal, Shivam and Zhang, Zimin and Yuan, Lifan and Han, Jiawei and Peng, Hao , month = may, year =. The. doi:10.48550/arXiv.2505.15134 , abstract =

work page doi:10.48550/arxiv.2505.15134
[31]

Bengio, Yoshua and Ducharme, Réjean and Vincent, Pascal , year =. A. Advances in

work page
[32]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and DasSarma, Nova and Drain, Dawn and Fort, Stanislav and Ganguli, Deep and Henighan, Tom and Joseph, Nicholas and Kadavath, Saurav and Kernion, Jackson and Conerly, Tom and El-Showk, Sheer and Elhage, Nelson and Hatfield-Dodds, Zac and Hernandez, Danny and Hume, Tristan and...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.05862
[33]

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155
[34]

Scaling Instruction-Finetuned Language Models

Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Yunxuan and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Castro-Ros, Alex and Pellat, Marie and Robinson, Kevin and V...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.11416
[35]

Finetuned Language Models Are Zero-Shot Learners

Wei, Jason and Bosma, Maarten and Zhao, Vincent Y. and Guu, Kelvin and Yu, Adams Wei and Lester, Brian and Du, Nan and Dai, Andrew M. and Le, Quoc V. , month = feb, year =. Finetuned. doi:10.48550/arXiv.2109.01652 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2109.01652
[36]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , month = sep, year =. Exploring the. doi:10.48550/arXiv.1910.10683 , abstract =

work page internal anchor Pith review doi:10.48550/arxiv.1910.10683 1910
[37]

LLaMA: Open and Efficient Foundation Language Models

Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timothée and Rozière, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume , year =. doi:10.48550/ARXIV.2302.13971 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.13971
[38]

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle, Ahmad and Letman, Aiesha and Mathur, Akhil and Schelten, Alan and Vaughan, Alex and Yang, Amy and Fan, Angela and Goyal, Anirudh and Hartshorn, Anthony and Yang, Aobo and Mitra, Archi and Sravankumar, Archie and Korenev, Artem and Hinsvark, A...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
[39]

Improved

Kang, Daniel and Hashimoto, Tatsunori , year =. Improved. doi:10.48550/ARXIV.2004.14589 , abstract =

work page doi:10.48550/arxiv.2004.14589 2004
[40]

Sajjadi, Mehdi S. M. and Bachem, Olivier and Lucic, Mario and Bousquet, Olivier and Gelly, Sylvain , year =. Assessing. doi:10.48550/ARXIV.1806.00035 , abstract =

work page doi:10.48550/arxiv.1806.00035
[41]

WinoGrande: An Adversarial Winograd Schema Challenge at Scale

Sakaguchi, Keisuke and Bras, Ronan Le and Bhagavatula, Chandra and Choi, Yejin , year =. doi:10.48550/ARXIV.1907.10641 , abstract =

work page internal anchor Pith review doi:10.48550/arxiv.1907.10641 1907
[42]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, Peter and Cowhey, Isaac and Etzioni, Oren and Khot, Tushar and Sabharwal, Ashish and Schoenick, Carissa and Tafjord, Oyvind , year =. Think you have. doi:10.48550/ARXIV.1803.05457 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.05457
[43]

Pang, Richard Yuanzhe and He, He , month = mar, year =. Text. doi:10.48550/arXiv.2009.07839 , abstract =

work page doi:10.48550/arxiv.2009.07839 2009
[44]

Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor , month = dec, year =. The. doi:10.48550/arXiv.2101.00027 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2101.00027
[45]

Soldaini, Luca and Kinney, Rodney and Bhagia, Akshita and Schwenk, Dustin and Atkinson, David and Authur, Russell and Bogin, Ben and Chandu, Khyathi and Dumas, Jennifer and Elazar, Yanai and Hofmann, Valentin and Jha, Ananya Harsh and Kumar, Sachin and Lucy, Li and Lyu, Xinxi and Lambert, Nathan and Magnusson, Ian and Morrison, Jacob and Muennighoff, Nikl...

work page doi:10.48550/arxiv.2402.00159
[46]

Alpacafarm: A simulation framework for methods that learn from human feedback

Dubois, Yann and Li, Xuechen and Taori, Rohan and Zhang, Tianyi and Gulrajani, Ishaan and Ba, Jimmy and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B. , year =. doi:10.48550/ARXIV.2305.14387 , abstract =

work page doi:10.48550/arxiv.2305.14387
[47]

Tailoring

Ji, Haozhe and Ke, Pei and Hu, Zhipeng and Zhang, Rongsheng and Huang, Minlie , month = feb, year =. Tailoring. doi:10.48550/arXiv.2302.13344 , abstract =

work page doi:10.48550/arxiv.2302.13344
[48]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed and Le, Quoc and Zhou, Denny , month = jan, year =. Chain-of-. doi:10.48550/arXiv.2201.11903 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11903
[49]

Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel M. and Wu, Jeffrey and W...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.14165 2005
[50]

Attention Is All You Need

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , month = aug, year =. Attention. doi:10.48550/arXiv.1706.03762 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762
[51]

Texygen: A benchmarking platform for text generation models, 2018

Zhu, Yaoming and Lu, Sidi and Zheng, Lei and Guo, Jiaxian and Zhang, Weinan and Wang, Jun and Yu, Yong , month = feb, year =. Texygen:. doi:10.48550/arXiv.1802.01886 , abstract =

work page doi:10.48550/arxiv.1802.01886
[52]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , month = oct, year =. doi:10.48550/arXiv.2106.09685 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2106.09685
[53]

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke , month = may, year =. doi:10.48550/arXiv.2305.14314 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.14314
[54]

, year =

Taori, Rohan and Gulrajani, Ishaan and Zhang, Tianyi and Dubois, Yann and Li, Xuechen and Guestrin, Carlos and Liang, Percy and Hashimoto, Tatsunori B. , year =. Stanford

work page
[55]

Phi-4 Technical Report

Abdin, Marah and Aneja, Jyoti and Behl, Harkirat and Bubeck, Sébastien and Eldan, Ronen and Gunasekar, Suriya and Harrison, Michael and Hewett, Russell J. and Javaheripi, Mojan and Kauffmann, Piero and Lee, James R. and Lee, Yin Tat and Li, Yuanzhi and Liu, Weishung and Mendes, Caio C. T. and Nguyen, Anh and Price, Eric and Rosa, Gustavo de and Saarikivi,...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.08905
[56]

Groeneveld, Dirk and Beltagy, Iz and Walsh, Pete and Bhagia, Akshita and Kinney, Rodney and Tafjord, Oyvind and Jha, Ananya Harsh and Ivison, Hamish and Magnusson, Ian and Wang, Yizhong and Arora, Shane and Atkinson, David and Authur, Russell and Chandu, Khyathi Raghavi and Cohan, Arman and Dumas, Jennifer and Elazar, Yanai and Gu, Yuling and Hessel, Jack...

work page doi:10.48550/arxiv.2402.00838
[57]

Krogh, Anders and Hertz, John , year =. A. Advances in

work page
[58]

Bethune, Louis and Grangier, David and Busbridge, Dan and Gualdoni, Eleonora and Cuturi, Marco and Ablin, Pierre , month = may, year =. Scaling. doi:10.48550/arXiv.2502.06042 , abstract =

work page doi:10.48550/arxiv.2502.06042
[59]

Holtzman, Ari and Buys, Jan and Du, Li and Forbes, Maxwell and Choi, Yejin , month = feb, year =. The. doi:10.48550/arXiv.1904.09751 , abstract =

work page internal anchor Pith review doi:10.48550/arxiv.1904.09751 1904
[60]

Cognitive Science , author =

A learning algorithm for boltzmann machines , volume =. Cognitive Science , author =. 1985 , pages =. doi:10.1016/S0364-0213(85)80012-4 , abstract =

work page doi:10.1016/s0364-0213(85)80012-4 1985
[61]

Controlling Linguistic Style Aspects in Neural Language Generation,

Ficler, Jessica and Goldberg, Yoav , month = jul, year =. Controlling. doi:10.48550/arXiv.1707.02633 , abstract =

work page doi:10.48550/arxiv.1707.02633
[63]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and Huang, Shengyi and Ivison, Hamish and Brahman, Faeze and Miranda, Lester James V. and Liu, Alisa and Dziri, Nouha and Lyu, Shane and Gu, Yuling and Malik, Saumya and Graf, Victoria and Hwang, Jena D. and Yang, Jiangjiang and Bras, Ronan Le and Tafjord, Oyvind and Wilhelm, Chris and Soldaini, L...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2411.15124
[64]

A., Choi, Y., and Hajishirzi, H

Ivison, Hamish and Wang, Yizhong and Liu, Jiacheng and Wu, Zeqiu and Pyatkin, Valentina and Lambert, Nathan and Smith, Noah A. and Choi, Yejin and Hajishirzi, Hannaneh , year =. Unpacking. doi:10.48550/ARXIV.2406.09279 , abstract =

work page doi:10.48550/arxiv.2406.09279
[65]

Dai, Josef and Pan, Xuehai and Sun, Ruiyang and Ji, Jiaming and Xu, Xinbo and Liu, Mickel and Wang, Yizhou and Yang, Yaodong , year =. Safe. doi:10.48550/ARXIV.2310.12773 , abstract =

work page internal anchor Pith review doi:10.48550/arxiv.2310.12773
[66]

Constitutional AI: Harmlessness from AI Feedback

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and Tran-Johnson, Eli and Perez, Ethan an...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073
[67]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou, Andy and Wang, Zifan and Carlini, Nicholas and Nasr, Milad and Kolter, J. Zico and Fredrikson, Matt , year =. Universal and. doi:10.48550/ARXIV.2307.15043 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.15043
[68]

Catastrophic jailbreak of open-source llms via exploiting generation

Huang, Yangsibo and Gupta, Samyak and Xia, Mengzhou and Li, Kai and Chen, Danqi , year =. Catastrophic. doi:10.48550/ARXIV.2310.06987 , abstract =

work page doi:10.48550/arxiv.2310.06987
[69]

Rege Cambrin, Daniele and Gallipoli, Giuseppe and Benedetto, Irene and Cagliero, Luca and Garza, Paolo , editor =. Beyond. Findings of the. 2024 , pages =. doi:10.18653/v1/2024.findings-emnlp.704 , abstract =

work page doi:10.18653/v1/2024.findings-emnlp.704 2024
[70]

Influences on

Xia, Yuxi and Araujo, Pedro Henrique Luz de and Zaporojets, Klim and Roth, Benjamin , month = jan, year =. Influences on. doi:10.48550/arXiv.2501.03991 , abstract =

work page doi:10.48550/arxiv.2501.03991
[71]

Liu, Yang and Iter, Dan and Xu, Yichong and Wang, Shuohang and Xu, Ruochen and Zhu, Chenguang , month = may, year =. G-. doi:10.48550/arXiv.2303.16634 , abstract =

work page internal anchor Pith review doi:10.48550/arxiv.2303.16634
[72]

Training Verifiers to Solve Math Word Problems

Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John , month = nov, year =. Training. doi:10.48550/arXiv.2110.14168 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2110.14168
[73]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , month = dec, year =. Judging. doi:10.48550/arXiv.2306.05685 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685
[74]

LI, Jia and Beeching, Edward and Tunstall, Lewis and Lipkin, Ben and Soletskyi, Roman and Costa Huang, Shengyi and Rasul, Kashif and Yu, Longhui and Jiang, Albert and Shen, Ziju and Qin, Zihan and Dong, Bin and Zhou, Li and Fleureau, Yann and Lample, Guillaume and Polu, Stanislas , year =

work page
[75]

Amc 2023 competition problems , author =

work page 2023
[76]

Aime 2024 competition mathematical problems , author =

work page 2024
[77]

Control the

Troshin, Sergey and Mohammed, Wafaa and Meng, Yan and Monz, Christof and Fokkens, Antske and Niculae, Vlad , month = sep, year =. Control the. doi:10.48550/arXiv.2510.01218 , abstract =

work page doi:10.48550/arxiv.2510.01218
[78]

Nguyen, Minh Nhat and Baker, Andrew and Neo, Clement and Roush, Allen and Kirsch, Andreas and Shwartz-Ziv, Ravid , month = nov, year =. Turning. doi:10.48550/arXiv.2407.01082 , abstract =

work page doi:10.48550/arxiv.2407.01082
[79]

Journal of Machine Learning Research , author =

Managing. Journal of Machine Learning Research , author =. 2005 , pages =

work page 2005
[80]

Fine-Tuning Language Models from Human Preferences

Ziegler, Daniel M. and Stiennon, Nisan and Wu, Jeffrey and Brown, Tom B. and Radford, Alec and Amodei, Dario and Christiano, Paul and Irving, Geoffrey , month = jan, year =. Fine-. doi:10.48550/arXiv.1909.08593 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1909.08593 1909
[81]

Wang, Xuezhi and Wei, Jason and Schuurmans, Dale and Le, Quoc and Chi, Ed and Narang, Sharan and Chowdhery, Aakanksha and Zhou, Denny , month = mar, year =. Self-. doi:10.48550/arXiv.2203.11171 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.11171

Showing first 80 references.