Recognition: no theorem link
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Pith reviewed 2026-05-16 06:54 UTC · model grok-4.3
The pith
Prompting aligned LLMs like Llama-3-Instruct with only left-side conversation templates produces millions of realistic user queries and responses for alignment training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By feeding aligned LLMs only the left-side templates up to the position reserved for user messages, the models generate realistic user queries and then provide responses to them. This process extracts four million instructions and responses from Llama-3-Instruct; after filtering, three hundred thousand high-quality examples are retained. Fine-tuning Llama-3-8B-Base on these examples produces models that match the official Llama-3-8B-Instruct on some tasks and outperform earlier public datasets even when used solely for supervised fine-tuning, as measured on AlpacaEval, ArenaHard, and WildBench.
What carries the argument
Magpie synthesis: supplying an aligned LLM with only the system prompt and assistant prefix in a chat template so that auto-regression fills in a user query followed by its own response.
If this is right
- Models trained only on Magpie data for supervised fine-tuning can equal or exceed the official Llama-3-8B-Instruct on selected alignment benchmarks despite the official model using ten million points plus feedback learning.
- A single round of Magpie data alone surpasses prior public instruction sets that combined supervised fine-tuning with direct preference optimization on UltraFeedback.
- The method scales to four million generated examples from one aligned model with no additional human prompting effort.
- High-quality subsets of the synthetic data suffice for competitive downstream performance after standard filtering.
Where Pith is reading between the lines
- The same left-side prompting could be iterated on the newly aligned model itself to create successive rounds of self-generated data.
- Because the approach relies only on an already-aligned open model, it could be applied to other base models or languages to produce domain-specific alignment sets without new human effort.
- The gap between generated data volume and retained high-quality subset suggests that automatic quality filters remain a critical control point for scaling.
Load-bearing premise
The queries the model invents from partial templates are diverse enough and close enough to real user needs to produce effective alignment after filtering.
What would settle it
Fine-tuning a base model on the filtered Magpie data and observing substantially lower scores than the official Llama-3-8B-Instruct across AlpacaEval, ArenaHard, and WildBench would falsify the central performance claim.
read the original abstract
High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Magpie, a method to synthesize large-scale alignment data by prompting aligned LLMs such as Llama-3-Instruct with only left-side templates to generate user queries via their autoregressive prior. This produces 4 million instruction-response pairs from which 300K high-quality instances are selected; fine-tuning Llama-3-8B-Base on this subset yields models that match or exceed the official Llama-3-8B-Instruct on alignment benchmarks (AlpacaEval, ArenaHard, WildBench) despite using far less data than the 10M-point SFT+preference pipeline.
Significance. If the central empirical claim holds under scrutiny, the work provides a practical, open route to high-quality instruction data that could reduce dependence on proprietary alignment corpora and human curation, enabling more reproducible SFT for open models. The approach leverages an existing aligned model's prior rather than external prompts, which is a clean technical contribution if the generated queries prove sufficiently diverse and the filtering is transparent.
major comments (3)
- [§3] §3 (Data Generation and Filtering): The selection of the 300K high-quality subset is described only at a high level; explicit quality criteria, thresholds, or scoring functions are not provided, which is load-bearing because the performance parity claim rests on this curation step rather than raw volume.
- [§4] §4 (Experiments): No exact benchmark scores, confidence intervals, or statistical significance tests are reported for the Magpie-tuned models versus Llama-3-8B-Instruct or prior datasets; without these, it is impossible to evaluate whether the observed comparability is robust or sensitive to the 300K selection.
- [§4.3] §4.3 (Contamination and Diversity): The paper does not report checks for data contamination between the Magpie-generated set and the evaluation benchmarks, nor quantitative measures (e.g., embedding diversity, n-gram overlap with real user logs) confirming that the left-template-elicited queries are representative of actual user distributions.
minor comments (2)
- [Abstract] The abstract and §4 refer to 'some tasks' where Magpie matches the official model; a table or figure explicitly listing per-benchmark deltas would improve clarity.
- [§3] Notation for the left-side template construction could be formalized (e.g., as a prompt prefix function) to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to enhance transparency, provide additional quantitative details, and strengthen the experimental reporting.
read point-by-point responses
-
Referee: [§3] §3 (Data Generation and Filtering): The selection of the 300K high-quality subset is described only at a high level; explicit quality criteria, thresholds, or scoring functions are not provided, which is load-bearing because the performance parity claim rests on this curation step rather than raw volume.
Authors: We agree that the filtering process requires more explicit description. The 300K subset was obtained via a multi-stage pipeline: (1) automatic filters on query length (50-500 tokens), perplexity under Llama-3-8B-Base, and removal of duplicates via MinHash; (2) a quality scorer combining response coherence (via self-consistency checks) and alignment proxies (helpfulness/harmlessness scores from an auxiliary reward model); and (3) diversity-aware sampling via k-means clustering on embeddings. We will expand §3 with the precise thresholds, scoring formula, and pseudocode for reproducibility. revision: yes
-
Referee: [§4] §4 (Experiments): No exact benchmark scores, confidence intervals, or statistical significance tests are reported for the Magpie-tuned models versus Llama-3-8B-Instruct or prior datasets; without these, it is impossible to evaluate whether the observed comparability is robust or sensitive to the 300K selection.
Authors: The submitted manuscript contains comparative tables, but we acknowledge the absence of exact numerical values with uncertainty estimates and formal tests. In the revision we will report the precise win rates / scores on AlpacaEval, ArenaHard, and WildBench for all models, include 95% confidence intervals obtained via bootstrap resampling over 1000 iterations, and add paired statistical significance tests (e.g., bootstrap p-values) comparing Magpie-tuned models against Llama-3-8B-Instruct and prior datasets. This will clarify robustness with respect to the 300K selection. revision: yes
-
Referee: [§4.3] §4.3 (Contamination and Diversity): The paper does not report checks for data contamination between the Magpie-generated set and the evaluation benchmarks, nor quantitative measures (e.g., embedding diversity, n-gram overlap with real user logs) confirming that the left-template-elicited queries are representative of actual user distributions.
Authors: We did not include contamination or quantitative diversity analyses in the initial submission. We will add these in the revision: (1) contamination checks via 5-gram overlap and cosine similarity of sentence embeddings between the Magpie set and the test splits of AlpacaEval, ArenaHard, and WildBench; (2) diversity metrics including average pairwise embedding distance (using Llama-3 embeddings) and n-gram overlap statistics against public user-query corpora such as ShareGPT and LMSYS-Chat. These additions will demonstrate both low contamination risk and representativeness of the generated queries. revision: yes
Circularity Check
No significant circularity; derivation is empirical and externally benchmarked
full rationale
The paper describes an empirical pipeline: prompt Llama-3-Instruct with left-side templates to auto-regressively generate user queries, pair them with model responses, filter to 300K instances, fine-tune Llama-3-8B-Base, and evaluate on independent external benchmarks (AlpacaEval, ArenaHard, WildBench). No equation or step reduces by construction to its own inputs; no parameter is fitted on a subset and then renamed as a prediction; no load-bearing claim rests on a self-citation chain or imported uniqueness theorem. The central result—that the synthesized data yields competitive alignment—is presented as an observable outcome of the procedure rather than a definitional tautology, making the argument self-contained against external metrics.
Axiom & Free-Parameter Ledger
free parameters (1)
- quality selection criteria
axioms (1)
- domain assumption Aligned LLMs can generate coherent and useful user queries when prompted with only the left-side conversation templates.
Forward citations
Cited by 21 Pith papers
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
NodeSynth generates evidence-anchored synthetic queries that trigger up to five times higher failure rates in mainstream LLMs than human-authored benchmarks.
-
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
MORA breaks the safety-helpfulness trade-off in LLM alignment by pre-sampling single-reward prompts and rewriting them to expand multi-dimensional reward diversity, yielding 5-12.4% single-preference gains in sequenti...
-
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall i...
-
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
PARD-2 uses Confidence-Adaptive Token optimization to align draft model training with acceptance length in speculative decoding, enabling dual-mode operation and up to 6.94x lossless speedup on Llama3.1-8B.
-
Hypothesis generation and updating in large language models
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
-
ZAYA1-8B Technical Report
ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
-
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
-
To Adapt or not to Adapt, Rethinking the Value of Medical Knowledge-Aware Large Language Models
Domain-adapted clinical LLMs provide only marginal and unstable gains over general models on English clinical MCQA benchmarks, while new Spanish Marmoka models perform better.
-
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.
-
Multi-Token Prediction via Self-Distillation
Self-distillation turns pretrained autoregressive LMs into multi-token predictors that decode over 3x faster with under 5% accuracy drop on GSM8K.
-
SmolVLM: Redefining small and efficient multimodal models
SmolVLM-256M outperforms a 300-times larger model using under 1 GB GPU memory, while the 2.2B version matches state-of-the-art VLMs at half the memory cost.
-
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.
-
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
LoPT delivers competitive LLM post-training results by training only the top half on the task objective and using feature reconstruction to update the bottom half.
-
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.
-
Kimi-Audio Technical Report
Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million ho...
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
-
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B a...
-
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision is the first single open LMM to simultaneously achieve strong performance in single-image, multi-image, and video scenarios with cross-scenario transfer capabilities.
-
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
VideoLLaMA3 uses a vision-centric training paradigm and token-reduction design to reach competitive results on image and video benchmarks.
Reference graph
Works this paper leans on
-
[5]
and Stoica, Ion and Xing, Eric P
Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , month =. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90\ url =
-
[6]
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts , author=. EMNLP , year=
-
[8]
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=
What does it mean for a language model to preserve privacy? , author=. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=
work page 2022
-
[9]
Improving language understanding by generative pre-training , author=. 2018 , publisher=
work page 2018
-
[11]
International Conference on Machine Learning , pages=
Bag of tricks for training data extraction from language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=
work page 2023
-
[14]
Advances in Neural Information Processing Systems , volume=
Emergent and predictable memorization in large language models , author=. Advances in Neural Information Processing Systems , volume=
- [15]
-
[17]
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning , author=. ArXiv , year=
-
[20]
International Conference on Learning Representations , year=
Thieves on Sesame Street! Model Extraction of BERT-based APIs , author=. International Conference on Learning Representations , year=
-
[21]
30th USENIX Security Symposium (USENIX Security 21) , pages=
Extracting training data from large language models , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=
-
[22]
Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =
work page 2023
-
[23]
Advances in Neural Information Processing Systems , volume=
Lima: Less is more for alignment , author=. Advances in Neural Information Processing Systems , volume=
-
[25]
Gonzalez and Ion Stoica , booktitle=
Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. Judging. 2023 , url=
work page 2023
-
[26]
Gonzalez and Ion Stoica , month =
Tianle Li and Wei-Lin Chiang and Evan Frick and Lisa Dunlap and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica , month =. From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline , url =
-
[28]
International Conference on Artificial Intelligence and Statistics , pages=
A general theoretical paradigm to understand learning from human preferences , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=
work page 2024
-
[32]
Thirty-seventh Conference on Neural Information Processing Systems , year=
Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
- [34]
-
[35]
Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =
work page 2023
-
[39]
Wenting Zhao and Xiang Ren and Jack Hessel and Claire Cardie and Yejin Choi and Yuntian Deng , booktitle=. WildChat: 1M Chat. 2024 , url=
work page 2024
-
[40]
Gonzalez and Ion Stoica and Hao Zhang , booktitle=
Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Tianle Li and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zhuohan Li and Zi Lin and Eric Xing and Joseph E. Gonzalez and Ion Stoica and Hao Zhang , booktitle=. 2024 , url=
work page 2024
-
[45]
OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants , author =. 2023 , publisher =
work page 2023
-
[46]
and Khashabi, Daniel and Hajishirzi, Hannaneh
Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023
work page 2023
-
[47]
Advances in Neural Information Processing Systems , volume=
Camel: Communicative agents for" mind" exploration of large language model society , author=. Advances in Neural Information Processing Systems , volume=
-
[49]
The Twelfth International Conference on Learning Representations , year=
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning , author=. The Twelfth International Conference on Learning Representations , year=
-
[50]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , author=. 2022 , eprint=
work page 2022
-
[51]
International Conference on Learning Representations , year=
Finetuned Language Models are Zero-Shot Learners , author=. International Conference on Learning Representations , year=
-
[52]
arXiv preprint arXiv:2309.15025 , year=
Large language model alignment: A survey , author=. arXiv preprint arXiv:2309.15025 , year=
-
[56]
WildBench: Benchmarking Language Models with Challenging Tasks from Real Users in the Wild , author =
-
[60]
The winograd schema challenge , author=. Thirteenth international conference on the principles of knowledge representation and reasoning , year=
-
[62]
Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf , title =. 2023 , publisher =
work page 2023
-
[63]
Introducing the next generation of Claude , author=
-
[64]
Our next-generation model: Gemini 1.5 , author=
- [65]
- [66]
- [67]
-
[68]
Zephyr: Direct Distillation of LM Alignment
Zephyr: Direct distillation of lm alignment , author=. arXiv preprint arXiv:2310.16944 , year=
work page internal anchor Pith review arXiv
-
[69]
Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=
-
[70]
Advances in Neural Information Processing Systems , volume=
Principle-driven self-alignment of language models from scratch with minimal human supervision , author=. Advances in Neural Information Processing Systems , volume=
-
[72]
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint , author=. 2024 , eprint=
work page 2024
-
[75]
International Conference on Learning Representations , year=
Multitask Prompted Training Enables Zero-Shot Task Generalization , author=. International Conference on Learning Representations , year=
-
[76]
OpenAssistant Conversations - Democratizing Large Language Model Alignment , url =
K\". OpenAssistant Conversations - Democratizing Large Language Model Alignment , url =. Advances in Neural Information Processing Systems , editor =
-
[80]
Tran, Hoang and Glaze, Chris and Hancock, Braden , title =
-
[81]
Forty-first International Conference on Machine Learning , year=
TrustLLM: Trustworthiness in Large Language Models , author=. Forty-first International Conference on Machine Learning , year=
-
[84]
Llama Team , title =
-
[85]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[86]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[87]
A general theoretical paradigm to understand learning from human preferences
Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. A general theoretical paradigm to understand learning from human preferences. In International Conference on Artificial Intelligence and Statistics, pp.\ 4447--4455. PMLR, 2024
work page 2024
-
[88]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[89]
Special characters attack: Toward scalable training data extraction from large language models
Yang Bai, Ge Pei, Jindong Gu, Yong Yang, and Xingjun Ma. Special characters attack: Toward scalable training data extraction from large language models. arXiv preprint arXiv:2405.05990, 2024
-
[90]
Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...
work page 2022
-
[91]
Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, and Thomas Wolf. Open llm leaderboard. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard, 2023
work page 2023
-
[92]
Emergent and predictable memorization in large language models
Stella Biderman, Usvsn Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, and Edward Raff. Emergent and predictable memorization in large language models. Advances in Neural Information Processing Systems, 36, 2023
work page 2023
-
[93]
Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tram \`e r. What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.\ 2280--2292, 2022
work page 2022
-
[94]
Extracting training data from large language models
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.\ 2633--2650, 2021
work page 2021
-
[95]
GenQA: Generat- ing millions of instructions from a handful of prompts,
Jiuhai Chen, Rifaa Qadri, Yuxin Wen, Neel Jain, John Kirchenbauer, Tianyi Zhou, and Tom Goldstein. Genqa: Generating millions of instructions from a handful of prompts. arXiv preprint arXiv:2406.10323, 2024
-
[96]
Alpagasus: Training a better alpaca with fewer data
Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, et al. Alpagasus: Training a better alpaca with fewer data. arXiv preprint arXiv:2307.08701, 2023
-
[97]
Gonzalez, Ion Stoica, and Eric P
Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna: An open-source chatbot impressing gpt-4 with 90\ URL https://lmsys.org/blog/2023-03-30-vicuna/
work page 2023
-
[98]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[99]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[100]
UltraFeedback : Boosting Language Models with High-quality Feedback
Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, and Maosong Sun. Ultrafeedback: Boosting language models with high-quality feedback. arXiv preprint arXiv:2310.01377, 2023
-
[101]
Databricks. Databricks dolly-15k, 2023. URL https://huggingface.co/datasets/databricks/databricks-dolly-15k
work page 2023
-
[102]
On the limitations of reference-free evaluations of generated text
Daniel Deutsch, Rotem Dror, and Dan Roth. On the limitations of reference-free evaluations of generated text. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 10960--10977, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Lin...
-
[103]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[104]
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[105]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library. 2024
work page 2024
-
[106]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[107]
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois, Bal \'a zs Galambosi, Percy Liang, and Tatsunori B Hashimoto. Length-controlled alpacaeval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[108]
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. Kto: Model alignment as prospect theoretic optimization. arXiv preprint arXiv:2402.01306, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[109]
Better synthetic data by retrieving and transforming existing datasets
Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, and Graham Neubig. Better synthetic data by retrieving and transforming existing datasets. arXiv preprint arXiv:2404.14361, 2024
-
[110]
Are we done with mmlu? arXiv preprint arXiv:2406.04127, 2024
Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, et al. Are we done with mmlu? arXiv preprint arXiv:2406.04127, 2024
-
[111]
Measuring Massive Multitask Language Understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[112]
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong, Noah Lee, and James Thorne. Reference-free monolithic preference optimization with odds ratio. arXiv preprint arXiv:2403.07691, 2024
work page internal anchor Pith review arXiv 2024
-
[113]
Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Hanchi Sun, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric P. Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Mano...
work page 2024
-
[114]
Camels in a changing climate: Enhancing lm adaptation with tulu 2
Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A Smith, Iz Beltagy, et al. Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv preprint arXiv:2311.10702, 2023
-
[115]
Alpaca against vicuna: Using llms to uncover memorization of llms
Aly M Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, and Santu Rana. Alpaca against vicuna: Using llms to uncover memorization of llms. arXiv preprint arXiv:2403.04801, 2024
-
[116]
o pf, Yannic Kilcher, Dimitri von R\
Andreas K\" o pf, Yannic Kilcher, Dimitri von R\" u tte, Sotiris Anagnostidis, Zhi Rui Tam, Keith Stevens, Abdullah Barhoum, Duc Nguyen, Oliver Stanley, Rich\' a rd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, and Alexander Mattick. Openassistant conversations - democratizing large langu...
work page 2023
-
[117]
Parikh, Nicolas Papernot, and Mohit Iyyer
Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on sesame street! model extraction of bert-based apis. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Byl5NREFDr
work page 2020
-
[118]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023
work page 2023
-
[119]
Rewardbench: Evaluating reward models for language modeling
Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, et al. Rewardbench: Evaluating reward models for language modeling. arXiv preprint arXiv:2403.13787, 2024
-
[120]
Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning, 2012
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.