Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Jianwei Yin; Kyusong Lee; Leigang Sha; Ruochen Xu; Tiancheng Zhao; Yutao Sun; Zilun Zhang

arxiv: 2406.11354 · v3 · submitted 2024-06-17 · 💻 cs.CL · cs.AI· cs.CV

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Zilun Zhang , Yutao Sun , Tiancheng Zhao , Leigang Sha , Ruochen Xu , Kyusong Lee , Jianwei Yin This is my paper

Pith reviewed 2026-05-24 00:08 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CV

keywords catastrophic forgettinglarge language modelsmultimodal LLMsself-decompressionTree Generationsupervised fine-tuningknowledge preservationinstruction tuning

0 comments

The pith

Tree Generation creates synthetic instruction data from an LLM that, when mixed into SFT, reduces language forgetting in multimodal models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models lose prior knowledge when fine-tuned on new tasks, and multimodal versions suffer extra decline on pure language benchmarks. The paper presents Tree Generation as a way to unpack an LLM's existing knowledge into a reusable corpus of synthetic supervised fine-tuning examples. Adding this corpus during later instruction tuning measurably limits the drop in language performance. A sympathetic reader would see this as a practical route to update models on domain data without having to store or replay the entire original pretraining set.

Core claim

Tree Generation (TG) is a model-agnostic self-decompression procedure that converts the parametric knowledge inside an LLM into an explicit training corpus by producing synthetic instruction-response pairs. TG-SFT applies this corpus during supervised fine-tuning of multimodal LLMs; the resulting models exhibit substantially less degradation on language-only benchmarks than models trained on the same target data without the added corpus.

What carries the argument

Tree Generation (TG), a procedure that synthetically expands an LLM's internal knowledge into instruction-tuning examples for later reuse.

If this is right

MLLMs can acquire visual capabilities while retaining more of their original language competence.
The same decompression step can be applied to plain LLMs before any domain-specific fine-tuning.
Once generated, the synthetic corpus can be stored and reused across multiple downstream fine-tuning runs without additional model queries.
The approach requires no changes to model architecture or training objective beyond data mixture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continual-learning pipelines could replace replay buffers with periodically regenerated TG corpora.
The method might extend to other modalities if the base model can be prompted to produce cross-modal instruction pairs.
If the synthetic data proves high-fidelity, pretraining checkpoints could be discarded after a single TG pass, lowering storage costs.

Load-bearing premise

The synthetic examples generated by Tree Generation accurately capture the original LLM's knowledge without distortion or loss of fidelity.

What would settle it

Training an MLLM on target data plus the TG corpus and observing no improvement, or a larger drop, on language benchmarks relative to target data alone would falsify the central claim.

Figures

Figures reproduced from arXiv: 2406.11354 by Jianwei Yin, Kyusong Lee, Leigang Sha, Ruochen Xu, Tiancheng Zhao, Yutao Sun, Zilun Zhang.

**Figure 1.** Figure 1: The motivation of Our Work. Shadow represents the error bar. The SFT of MLLM harms the language ability of its LLM backbone (MLLM has begun to forget its general language ability while training is processed). We choose the LLaMA2-7B-chat model as the LLM backbone for the experiments. Details of this experiment can be found in Appendix A.1. The first data point is evaluated from the checkpoint of 3000 ste… view at source ↗

**Figure 2.** Figure 2: TG-SFT structure overview, illustrates a three-layer complete tree structure. In practice, the depth of [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: 𝑆𝐷 [INST] <<SYS>> A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user’s questions. The topic of this conversation will be focused on all kinds of world knowledge. <</SYS>> What is type 2 diabetes? What are its causes? [/INST] Type 2 diabetes is a prevalent and complex metabolic disorder that affects how our body regu… view at source ↗

**Figure 5.** Figure 5: Number of turns in TG-SFT decompressed Data the 2-turn corpus achieves the best performance in LLM benchmarks compared to the other two configurations. This could be attributed to the Gturn corpus being too diverse in context length and the 1-turn corpus being too short, which harms the the model during SFT. 5 Conclusion & Future Work To address the problem of catastrophic forgetting in LLMs and MLLMs, w… view at source ↗

read the original abstract

Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks was observed compared to their single-modality counterparts. To address these challenges, we introduce a novel model-agnostic self-decompression method, Tree Generation (TG), that decompresses knowledge within LLMs into the training corpus. This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps. By incorporating the dumped corpus during SFT for MLLMs, we significantly reduce the forgetting problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tree Generation is pitched as a model-agnostic way to turn LLM internal knowledge into synthetic SFT data that preserves language performance in MLLMs, but the description supplies zero numbers or checks.

read the letter

The paper's main point is that LLMs lose old knowledge during fine-tuning or when turned into MLLMs, and their Tree Generation (TG) method decompresses that knowledge into a corpus of synthetic instruction examples that can be mixed back in during SFT to limit the damage. The claim is that this TG-SFT step lets the model keep language benchmark scores while adding multimodal capabilities. The tree-structured generation is presented as the novel, model-agnostic piece. That framing is at least a clean way to package self-generated replay data for the continual-learning setting. The work also correctly flags the practical drop in language performance that shows up in models like LLaVA compared with their base LLMs. That observation matches what many practitioners see. The soft spots are large and central. No quantitative results appear at all—no before-and-after scores on standard language benchmarks, no comparison to replay, regularization, or other anti-forgetting baselines, and no protocol for how the tree is constructed or how many examples are produced. The key assumption that the generated corpus faithfully reinforces rather than distorts the original knowledge is stated but not tested; nothing is said about perplexity checks, knowledge probes, or human inspection of the synthetic pairs. If the generation step drops rare facts or adds hallucinations, the later SFT step would be expected to hurt rather than help. The stress-test concern lands directly on the evidence that is missing. This is aimed at researchers working on continual learning or multimodal fine-tuning who might want to experiment with the tree idea themselves. A reader looking for a method with reproducible gains or clear comparisons will not find it. The paper is not ready for peer review until it includes the missing experiments and verification steps.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a model-agnostic self-decompression method called Tree Generation (TG) that extracts knowledge from an LLM into a synthetic training corpus. It focuses on the TG-SFT variant, which generates instruction-tuning data; the central claim is that incorporating this corpus during supervised fine-tuning of MLLMs (e.g., LLaVA) significantly reduces catastrophic forgetting on language benchmarks relative to standard SFT.

Significance. If the empirical results hold, the approach would supply a parameter-free, model-internal mechanism for preserving base-LLM capabilities when extending to multimodal settings. This addresses a documented practical limitation of current MLLMs without requiring external data or architectural changes.

major comments (2)

[Abstract] Abstract: the claim that TG-SFT 'significantly reduce[s] the forgetting problem' is stated without any quantitative results, baselines, metrics, or experimental protocol. No numbers, tables, or figures are referenced to support the reduction.
The method's validity rests on the unverified assumption that Tree Generation produces synthetic SFT examples whose factual and reasoning content matches the original LLM without systematic omission or hallucination. No fidelity checks (perplexity on held-out pre-training data, knowledge-probe accuracy, or generated-vs-human pair comparison) are described.

minor comments (1)

[Abstract] The phrase 'dumped corpus' is used without a concise definition or high-level description of the decompression procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the need to substantiate the core assumptions of Tree Generation. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that TG-SFT 'significantly reduce[s] the forgetting problem' is stated without any quantitative results, baselines, metrics, or experimental protocol. No numbers, tables, or figures are referenced to support the reduction.

Authors: We agree the abstract would be strengthened by referencing the supporting evidence. In the revised manuscript we will update the abstract to cite the key quantitative findings (e.g., the measured reduction in language-benchmark degradation relative to standard SFT), the evaluation metrics, and the relevant tables/figures from the experimental section. revision: yes
Referee: [—] The method's validity rests on the unverified assumption that Tree Generation produces synthetic SFT examples whose factual and reasoning content matches the original LLM without systematic omission or hallucination. No fidelity checks (perplexity on held-out pre-training data, knowledge-probe accuracy, or generated-vs-human pair comparison) are described.

Authors: The recursive tree-expansion procedure is intended to elicit comprehensive knowledge from the source LLM, and the observed reduction in catastrophic forgetting supplies indirect support for the quality of the generated data. We acknowledge that explicit fidelity verification was omitted from the initial submission. In revision we will add a dedicated limitations paragraph discussing this assumption and will report basic fidelity metrics (e.g., knowledge-probe accuracy on held-out queries) using the generated corpus. revision: partial

Circularity Check

0 steps flagged

No circularity: method is a proposed generation procedure whose efficacy is claimed to be shown empirically

full rationale

The provided abstract and description outline a new Tree Generation procedure that produces synthetic SFT data from an LLM, followed by an empirical claim that including this data during MLLM fine-tuning reduces forgetting. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the given text. The derivation chain does not reduce any result to its own inputs by construction; the central claim remains an external empirical assertion about the generated corpus's effect.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5695 in / 855 out tokens · 28084 ms · 2026-05-24T00:08:03.992502+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 29 internal anchors

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Aleixo, Juan G

Everton L. Aleixo, Juan G. Colonna, Marco Cristo, and Everlandio Fernandes. 2023. https://arxiv.org/abs/2312.10549 Catastrophic forgetting in deep learning: A comprehensive taxonomy . Preprint, arXiv:2312.10549

work page arXiv 2023
[4]

AI Anthropic. 2024. The claude 3 model family: Opus, sonnet, haiku. Claude-3 Model Card

work page 2024
[5]

Llemma: An Open Language Model For Mathematics

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, and Sean Welleck. 2024. https://arxiv.org/abs/2310.10631 Llemma: An open language model for mathematics . Preprint, arXiv:2310.10631

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[7]

Carbonell and Jade Goldstein

Jaime G. Carbonell and Jade Goldstein. 2017. The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR Forum , 51(2):209--210

work page 2017
[8]

Ted Chiang. 2023. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web Chatgpt is a blurry jpeg of the web

work page 2023
[9]

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. 2018. https://arxiv.org/abs/1803.05457 Think you have solved question answering? try arc, the ai2 reasoning challenge . Preprint, arXiv:1803.05457

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, and Joel Veness. 2024. https://arxiv.org/abs/2309.10668 Language modeling is compression . Preprint, arXiv:2309.10668

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, and Xuanjing Huang. 2024. https://arxiv.org/abs/2312.09979 Loramoe: Alleviate world knowledge forgetting in large language models via moe-style plugin . Preprint, arXiv:2312.09979

work page arXiv 2024
[12]

Matthew Finlayson, Xiang Ren, and Swabha Swayamdipta. 2024. https://arxiv.org/abs/2403.09539 Logits of api-protected llms leak proprietary information . Preprint, arXiv:2403.09539

work page arXiv 2024
[13]

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac'h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. 2023. https...

work page doi:10.5281/zenodo.10256836 2023
[14]

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. 2017. https://arxiv.org/abs/1612.00837 Making the v in vqa matter: Elevating the role of image understanding in visual question answering . Preprint, arXiv:1612.00837

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, and Furu Wei. 2024. https://arxiv.org/abs/2402.17759 Towards optimal learning of language models . Preprint, arXiv:2402.17759

work page arXiv 2024
[16]

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, and Yuanzhi Li. 2023. https://arxiv.org/abs/2306.11644 Textbooks are...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P. Bigham. 2018. https://arxiv.org/abs/1802.08218 Vizwiz grand challenge: Answering visual questions from blind people . Preprint, arXiv:1802.08218

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. https://arxiv.org/abs/2004.10964 Don't stop pretraining: Adapt language models to domains and tasks . Preprint, arXiv:2004.10964

work page arXiv 2020
[19]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. https://arxiv.org/abs/2009.03300 Measuring massive multitask language understanding . Preprint, arXiv:2009.03300

work page internal anchor Pith review Pith/arXiv arXiv 2021
[20]

Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, and Tomas Pfister. 2023. https://arxiv.org/abs/2305.02301 Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes . Preprint, arXiv:2305.02301

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. https://arxiv.org/abs/2106.09685 Lora: Low-rank adaptation of large language models . Preprint, arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021
[22]

Quzhe Huang, Mingxu Tao, Chen Zhang, Zhenwei An, Cong Jiang, Zhibin Chen, Zirui Wu, and Yansong Feng. 2023. https://arxiv.org/abs/2305.15062 Lawyer llama technical report . Preprint, arXiv:2305.15062

work page arXiv 2023
[23]

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Drew A. Hudson and Christopher D. Manning. 2019. https://arxiv.org/abs/1902.09506 Gqa: A new dataset for real-world visual reasoning and compositional question answering . Preprint, arXiv:1902.09506

work page internal anchor Pith review Pith/arXiv arXiv 2019
[24]

Eric Jang. 2023. https://evjang.com/2023/03/26/self-reflection.html Can llms critique and iterate on their own outputs? evjang.com

work page 2023
[25]

Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. https://arxiv.org/abs/2307.10169 Challenges and applications of large language models . Preprint, arXiv:2307.10169

work page arXiv 2023
[26]

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, and Fahad Shahbaz Khan. 2023. https://arxiv.org/abs/2311.15826 Geochat: Grounded large vision-language model for remote sensing . Preprint, arXiv:2311.15826

work page arXiv 2023
[27]

Mahoney, Kurt Keutzer, and Amir Gholami

Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipali, Michael W. Mahoney, Kurt Keutzer, and Amir Gholami. 2024. https://arxiv.org/abs/2403.15042 Llm2llm: Boosting llms with novel iterative data enhancement . Preprint, arXiv:2403.15042

work page arXiv 2024
[28]

Bo Li*, Kaichen Zhang* Peiyuan Zhang*, Fanyi Pu*, Xinrun Du, Yuhao Dong, Haotian Liu, Yuanhan Zhang, Ge Zhang, Chunyuan Li, and Ziwei Liu. 2024. https://github.com/EvolvingLMMs-Lab/lmms-eval Lmms-eval: Accelerating the development of large multimoal models

work page 2024
[29]

Bohao Li, Rui Wang, Guangzhi Wang, Yuying Ge, Yixiao Ge, and Ying Shan. 2023 a . https://arxiv.org/abs/2307.16125 Seed-bench: Benchmarking multimodal llms with generative comprehension . Preprint, arXiv:2307.16125

work page internal anchor Pith review Pith/arXiv arXiv 2023
[30]

Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. 2023 b . Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. 2023 c . https://arxiv.org/abs/2305.10355 Evaluating object hallucination in large vision-language models . Preprint, arXiv:2305.10355

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, and Yin Tat Lee. 2023 d . https://arxiv.org/abs/2309.05463 Textbooks are all you need ii: phi-1.5 technical report . Preprint, arXiv:2309.05463

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, and Ming Yin. 2023 e . https://arxiv.org/abs/2310.07849 Synthetic data generation with large language models for text classification: Potential and limitations . Preprint, arXiv:2310.07849

work page arXiv 2023
[34]

Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. https://arxiv.org/abs/2109.07958 Truthfulqa: Measuring how models mimic human falsehoods . Preprint, arXiv:2109.07958

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning

work page 2023
[36]

Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, and Andrew M. Dai. 2024 a . https://arxiv.org/abs/2404.07503 Best practices and lessons learned on synthetic data for language models . Preprint, arXiv:2404.07503

work page arXiv 2024
[37]

Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, and Dahua Lin. 2024 b . https://arxiv.org/abs/2307.06281 Mmbench: Is your multi-modal model an all-around player? Preprint, arXiv:2307.06281

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. https://arxiv.org/abs/2209.09513 Learn to explain: Multimodal reasoning via thought chains for science question answering . Preprint, arXiv:2209.09513

work page arXiv 2022
[39]

Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. 2024. https://arxiv.org/abs/2308.08747 An empirical study of catastrophic forgetting in large language models during continual fine-tuning . Preprint, arXiv:2308.08747

work page internal anchor Pith review Pith/arXiv arXiv 2024
[40]

Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft

work page 2022
[41]

Arindam Mitra, Hamed Khanpour, Corby Rosset, and Ahmed Awadallah. 2024. https://arxiv.org/abs/2402.14830 Orca-math: Unlocking the potential of slms in grade school math . Preprint, arXiv:2402.14830

work page arXiv 2024
[42]

Scalable Extraction of Training Data from (Production) Language Models

Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, and Katherine Lee. 2023. https://arxiv.org/abs/2311.17035 Scalable extraction of training data from (production) language models . Preprint, arXiv:2311.17035

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

Ankit Patel. 2024. N V I D I A R eleases O pen S ynthetic D ata G eneration P ipeline for T raining L arge L anguage M odels --- blogs.nvidia.com. https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/. [Accessed 15-06-2024]

work page 2024
[45]

Jack Rae. 2023. chttps://www.youtube.com/watch?v=dO4TPJkeaaU Compression for agi

work page 2023
[46]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP/IJCNLP (1) , pages 3980--3990. Association for Computational Linguistics

work page 2019
[47]

Nils Reimers and Iryna Gurevych. 2020. https://arxiv.org/abs/2004.09813 Making monolingual sentence embeddings multilingual using knowledge distillation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics

work page arXiv 2020
[48]

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, J \'e r \'e my Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950

work page internal anchor Pith review Pith/arXiv arXiv 2023
[49]

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2019. https://arxiv.org/abs/1907.10641 Winogrande: An adversarial winograd schema challenge at scale . Preprint, arXiv:1907.10641

work page internal anchor Pith review Pith/arXiv arXiv 2019
[50]

Siu, Byron C

Chantal Shaib, Joe Barrow, Jiuding Sun, Alexa F. Siu, Byron C. Wallace, and Ani Nenkova. 2024. https://arxiv.org/abs/2403.00553 Standardizing the measurement of text diversity: A tool and a comparative analysis of scores . Preprint, arXiv:2403.00553

work page arXiv 2024
[51]

Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, and Marcus Rohrbach. 2019. https://arxiv.org/abs/1904.08920 Towards vqa models that can read . Preprint, arXiv:1904.08920

work page internal anchor Pith review Pith/arXiv arXiv 2019
[52]

Ilya Sutskever. 2023. https://www.youtube.com/watch?v=Yf1o0TQzry8 Ilya sutskever (openai chief scientist) - building agi, alignment, spies, microsoft, & enlightenment

work page 2023
[53]

Gemini Team, Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry, Lepikhin, Timothy Lillicrap, Jean baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds,...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[54]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Harts...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[55]

Laurens van der Maaten and Geoffrey Hinton. 2008. http://jmlr.org/papers/v9/vandermaaten08a.html Visualizing data using t-sne . Journal of Machine Learning Research, 9(86):2579--2605

work page 2008
[56]

Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi Zhang, Tao Gui, and Xuanjing Huang. 2023. https://arxiv.org/abs/2310.14152 Orthogonal subspace learning for language model continual learning . Preprint, arXiv:2310.14152

work page arXiv 2023
[57]

Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, and Quoc V. Le. 2024. https://arxiv.org/abs/2403.18802 Long-form factuality in large language models . Preprint, arXiv:2403.18802

work page arXiv 2024
[58]

Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, and Ping Luo. 2024. https://arxiv.org/abs/2401.02415 Llama pro: Progressive llama with block expansion . Preprint, arXiv:2401.02415

work page arXiv 2024
[59]

Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, and Daxin Jiang. 2023. https://arxiv.org/abs/2304.12244 Wizardlm: Empowering large language models to follow complex instructions . Preprint, arXiv:2304.12244

work page internal anchor Pith review Pith/arXiv arXiv 2023
[60]

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, and Bill Yuchen Lin. 2024. https://arxiv.org/abs/2406.08464 Magpie: Alignment data synthesis from scratch by prompting aligned llms with nothing . Preprint, arXiv:2406.08464

work page internal anchor Pith review Pith/arXiv arXiv 2024
[61]

Yibo Yang, Stephan Mandt, and Lucas Theis. 2023. https://arxiv.org/abs/2202.06533 An introduction to neural data compression . Preprint, arXiv:2202.06533

work page arXiv 2023
[62]

Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, and Qian Liu. 2024. https://arxiv.org/abs/2402.13669 Self-distillation bridges distribution gap in language model fine-tuning . Preprint, arXiv:2402.13669

work page arXiv 2024
[63]

Asaf Yehudai, Boaz Carmeli, Yosi Mass, Ofir Arviv, Nathaniel Mills, Assaf Toledo, Eyal Shnarch, and Leshem Choshen. 2024. https://arxiv.org/abs/2401.14367 Genie: Achieving human parity in content-grounded datasets generation . Preprint, arXiv:2401.14367

work page arXiv 2024
[64]

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2024. https://arxiv.org/abs/2306.13549 A survey on multimodal large language models . Preprint, arXiv:2306.13549

work page internal anchor Pith review Pith/arXiv arXiv 2024
[65]

Li Yunxiang, Li Zihan, Zhang Kai, Dan Ruilong, and Zhang You. 2023. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070

work page arXiv 2023
[66]

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. https://arxiv.org/abs/1905.07830 Hellaswag: Can a machine really finish your sentence? Preprint, arXiv:1905.07830

work page internal anchor Pith review Pith/arXiv arXiv 2019

[1] [1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Aleixo, Juan G

Everton L. Aleixo, Juan G. Colonna, Marco Cristo, and Everlandio Fernandes. 2023. https://arxiv.org/abs/2312.10549 Catastrophic forgetting in deep learning: A comprehensive taxonomy . Preprint, arXiv:2312.10549

work page arXiv 2023

[4] [4]

AI Anthropic. 2024. The claude 3 model family: Opus, sonnet, haiku. Claude-3 Model Card

work page 2024

[5] [5]

Llemma: An Open Language Model For Mathematics

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, and Sean Welleck. 2024. https://arxiv.org/abs/2310.10631 Llemma: An open language model for mathematics . Preprint, arXiv:2310.10631

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page internal anchor Pith review Pith/arXiv arXiv 2020

[7] [7]

Carbonell and Jade Goldstein

Jaime G. Carbonell and Jade Goldstein. 2017. The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR Forum , 51(2):209--210

work page 2017

[8] [8]

Ted Chiang. 2023. https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web Chatgpt is a blurry jpeg of the web

work page 2023

[9] [9]

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. 2018. https://arxiv.org/abs/1803.05457 Think you have solved question answering? try arc, the ai2 reasoning challenge . Preprint, arXiv:1803.05457

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, and Joel Veness. 2024. https://arxiv.org/abs/2309.10668 Language modeling is compression . Preprint, arXiv:2309.10668

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, and Xuanjing Huang. 2024. https://arxiv.org/abs/2312.09979 Loramoe: Alleviate world knowledge forgetting in large language models via moe-style plugin . Preprint, arXiv:2312.09979

work page arXiv 2024

[12] [12]

Matthew Finlayson, Xiang Ren, and Swabha Swayamdipta. 2024. https://arxiv.org/abs/2403.09539 Logits of api-protected llms leak proprietary information . Preprint, arXiv:2403.09539

work page arXiv 2024

[13] [13]

Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac'h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. 2023. https...

work page doi:10.5281/zenodo.10256836 2023

[14] [14]

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. 2017. https://arxiv.org/abs/1612.00837 Making the v in vqa matter: Elevating the role of image understanding in visual question answering . Preprint, arXiv:1612.00837

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Yuxian Gu, Li Dong, Yaru Hao, Qingxiu Dong, Minlie Huang, and Furu Wei. 2024. https://arxiv.org/abs/2402.17759 Towards optimal learning of language models . Preprint, arXiv:2402.17759

work page arXiv 2024

[16] [16]

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, and Yuanzhi Li. 2023. https://arxiv.org/abs/2306.11644 Textbooks are...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P. Bigham. 2018. https://arxiv.org/abs/1802.08218 Vizwiz grand challenge: Answering visual questions from blind people . Preprint, arXiv:1802.08218

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. https://arxiv.org/abs/2004.10964 Don't stop pretraining: Adapt language models to domains and tasks . Preprint, arXiv:2004.10964

work page arXiv 2020

[19] [19]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. https://arxiv.org/abs/2009.03300 Measuring massive multitask language understanding . Preprint, arXiv:2009.03300

work page internal anchor Pith review Pith/arXiv arXiv 2021

[20] [20]

Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, and Tomas Pfister. 2023. https://arxiv.org/abs/2305.02301 Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes . Preprint, arXiv:2305.02301

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. https://arxiv.org/abs/2106.09685 Lora: Low-rank adaptation of large language models . Preprint, arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021

[22] [22]

Quzhe Huang, Mingxu Tao, Chen Zhang, Zhenwei An, Cong Jiang, Zhibin Chen, Zirui Wu, and Yansong Feng. 2023. https://arxiv.org/abs/2305.15062 Lawyer llama technical report . Preprint, arXiv:2305.15062

work page arXiv 2023

[23] [23]

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Drew A. Hudson and Christopher D. Manning. 2019. https://arxiv.org/abs/1902.09506 Gqa: A new dataset for real-world visual reasoning and compositional question answering . Preprint, arXiv:1902.09506

work page internal anchor Pith review Pith/arXiv arXiv 2019

[24] [24]

Eric Jang. 2023. https://evjang.com/2023/03/26/self-reflection.html Can llms critique and iterate on their own outputs? evjang.com

work page 2023

[25] [25]

Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. https://arxiv.org/abs/2307.10169 Challenges and applications of large language models . Preprint, arXiv:2307.10169

work page arXiv 2023

[26] [26]

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, and Fahad Shahbaz Khan. 2023. https://arxiv.org/abs/2311.15826 Geochat: Grounded large vision-language model for remote sensing . Preprint, arXiv:2311.15826

work page arXiv 2023

[27] [27]

Mahoney, Kurt Keutzer, and Amir Gholami

Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipali, Michael W. Mahoney, Kurt Keutzer, and Amir Gholami. 2024. https://arxiv.org/abs/2403.15042 Llm2llm: Boosting llms with novel iterative data enhancement . Preprint, arXiv:2403.15042

work page arXiv 2024

[28] [28]

Bo Li*, Kaichen Zhang* Peiyuan Zhang*, Fanyi Pu*, Xinrun Du, Yuhao Dong, Haotian Liu, Yuanhan Zhang, Ge Zhang, Chunyuan Li, and Ziwei Liu. 2024. https://github.com/EvolvingLMMs-Lab/lmms-eval Lmms-eval: Accelerating the development of large multimoal models

work page 2024

[29] [29]

Bohao Li, Rui Wang, Guangzhi Wang, Yuying Ge, Yixiao Ge, and Ying Shan. 2023 a . https://arxiv.org/abs/2307.16125 Seed-bench: Benchmarking multimodal llms with generative comprehension . Preprint, arXiv:2307.16125

work page internal anchor Pith review Pith/arXiv arXiv 2023

[30] [30]

Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao. 2023 b . Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. 2023 c . https://arxiv.org/abs/2305.10355 Evaluating object hallucination in large vision-language models . Preprint, arXiv:2305.10355

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, and Yin Tat Lee. 2023 d . https://arxiv.org/abs/2309.05463 Textbooks are all you need ii: phi-1.5 technical report . Preprint, arXiv:2309.05463

work page internal anchor Pith review Pith/arXiv arXiv 2023

[33] [33]

Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, and Ming Yin. 2023 e . https://arxiv.org/abs/2310.07849 Synthetic data generation with large language models for text classification: Potential and limitations . Preprint, arXiv:2310.07849

work page arXiv 2023

[34] [34]

Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. https://arxiv.org/abs/2109.07958 Truthfulqa: Measuring how models mimic human falsehoods . Preprint, arXiv:2109.07958

work page internal anchor Pith review Pith/arXiv arXiv 2022

[35] [35]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning

work page 2023

[36] [36]

Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, and Andrew M. Dai. 2024 a . https://arxiv.org/abs/2404.07503 Best practices and lessons learned on synthetic data for language models . Preprint, arXiv:2404.07503

work page arXiv 2024

[37] [37]

Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, and Dahua Lin. 2024 b . https://arxiv.org/abs/2307.06281 Mmbench: Is your multi-modal model an all-around player? Preprint, arXiv:2307.06281

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. https://arxiv.org/abs/2209.09513 Learn to explain: Multimodal reasoning via thought chains for science question answering . Preprint, arXiv:2209.09513

work page arXiv 2022

[39] [39]

Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. 2024. https://arxiv.org/abs/2308.08747 An empirical study of catastrophic forgetting in large language models during continual fine-tuning . Preprint, arXiv:2308.08747

work page internal anchor Pith review Pith/arXiv arXiv 2024

[40] [40]

Sourab Mangrulkar, Sylvain Gugger, Lysandre Debut, Younes Belkada, Sayak Paul, and Benjamin Bossan. 2022. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft

work page 2022

[41] [41]

Arindam Mitra, Hamed Khanpour, Corby Rosset, and Ahmed Awadallah. 2024. https://arxiv.org/abs/2402.14830 Orca-math: Unlocking the potential of slms in grade school math . Preprint, arXiv:2402.14830

work page arXiv 2024

[42] [42]

Scalable Extraction of Training Data from (Production) Language Models

Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Eric Wallace, Florian Tramèr, and Katherine Lee. 2023. https://arxiv.org/abs/2311.17035 Scalable extraction of training data from (production) language models . Preprint, arXiv:2311.17035

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [44]

Ankit Patel. 2024. N V I D I A R eleases O pen S ynthetic D ata G eneration P ipeline for T raining L arge L anguage M odels --- blogs.nvidia.com. https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/. [Accessed 15-06-2024]

work page 2024

[45] [45]

Jack Rae. 2023. chttps://www.youtube.com/watch?v=dO4TPJkeaaU Compression for agi

work page 2023

[46] [46]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In EMNLP/IJCNLP (1) , pages 3980--3990. Association for Computational Linguistics

work page 2019

[47] [47]

Nils Reimers and Iryna Gurevych. 2020. https://arxiv.org/abs/2004.09813 Making monolingual sentence embeddings multilingual using knowledge distillation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics

work page arXiv 2020

[48] [48]

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, J \'e r \'e my Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950

work page internal anchor Pith review Pith/arXiv arXiv 2023

[49] [49]

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2019. https://arxiv.org/abs/1907.10641 Winogrande: An adversarial winograd schema challenge at scale . Preprint, arXiv:1907.10641

work page internal anchor Pith review Pith/arXiv arXiv 2019

[50] [50]

Siu, Byron C

Chantal Shaib, Joe Barrow, Jiuding Sun, Alexa F. Siu, Byron C. Wallace, and Ani Nenkova. 2024. https://arxiv.org/abs/2403.00553 Standardizing the measurement of text diversity: A tool and a comparative analysis of scores . Preprint, arXiv:2403.00553

work page arXiv 2024

[51] [51]

Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, and Marcus Rohrbach. 2019. https://arxiv.org/abs/1904.08920 Towards vqa models that can read . Preprint, arXiv:1904.08920

work page internal anchor Pith review Pith/arXiv arXiv 2019

[52] [52]

Ilya Sutskever. 2023. https://www.youtube.com/watch?v=Yf1o0TQzry8 Ilya sutskever (openai chief scientist) - building agi, alignment, spies, microsoft, & enlightenment

work page 2023

[53] [53]

Gemini Team, Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry, Lepikhin, Timothy Lillicrap, Jean baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds,...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[54] [54]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Harts...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[55] [55]

Laurens van der Maaten and Geoffrey Hinton. 2008. http://jmlr.org/papers/v9/vandermaaten08a.html Visualizing data using t-sne . Journal of Machine Learning Research, 9(86):2579--2605

work page 2008

[56] [56]

Xiao Wang, Tianze Chen, Qiming Ge, Han Xia, Rong Bao, Rui Zheng, Qi Zhang, Tao Gui, and Xuanjing Huang. 2023. https://arxiv.org/abs/2310.14152 Orthogonal subspace learning for language model continual learning . Preprint, arXiv:2310.14152

work page arXiv 2023

[57] [57]

Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, and Quoc V. Le. 2024. https://arxiv.org/abs/2403.18802 Long-form factuality in large language models . Preprint, arXiv:2403.18802

work page arXiv 2024

[58] [58]

Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, and Ping Luo. 2024. https://arxiv.org/abs/2401.02415 Llama pro: Progressive llama with block expansion . Preprint, arXiv:2401.02415

work page arXiv 2024

[59] [59]

Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, and Daxin Jiang. 2023. https://arxiv.org/abs/2304.12244 Wizardlm: Empowering large language models to follow complex instructions . Preprint, arXiv:2304.12244

work page internal anchor Pith review Pith/arXiv arXiv 2023

[60] [60]

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, and Bill Yuchen Lin. 2024. https://arxiv.org/abs/2406.08464 Magpie: Alignment data synthesis from scratch by prompting aligned llms with nothing . Preprint, arXiv:2406.08464

work page internal anchor Pith review Pith/arXiv arXiv 2024

[61] [61]

Yibo Yang, Stephan Mandt, and Lucas Theis. 2023. https://arxiv.org/abs/2202.06533 An introduction to neural data compression . Preprint, arXiv:2202.06533

work page arXiv 2023

[62] [62]

Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, and Qian Liu. 2024. https://arxiv.org/abs/2402.13669 Self-distillation bridges distribution gap in language model fine-tuning . Preprint, arXiv:2402.13669

work page arXiv 2024

[63] [63]

Asaf Yehudai, Boaz Carmeli, Yosi Mass, Ofir Arviv, Nathaniel Mills, Assaf Toledo, Eyal Shnarch, and Leshem Choshen. 2024. https://arxiv.org/abs/2401.14367 Genie: Achieving human parity in content-grounded datasets generation . Preprint, arXiv:2401.14367

work page arXiv 2024

[64] [64]

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2024. https://arxiv.org/abs/2306.13549 A survey on multimodal large language models . Preprint, arXiv:2306.13549

work page internal anchor Pith review Pith/arXiv arXiv 2024

[65] [65]

Li Yunxiang, Li Zihan, Zhang Kai, Dan Ruilong, and Zhang You. 2023. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070

work page arXiv 2023

[66] [66]

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. https://arxiv.org/abs/1905.07830 Hellaswag: Can a machine really finish your sentence? Preprint, arXiv:1905.07830

work page internal anchor Pith review Pith/arXiv arXiv 2019