arxiv: 2406.08464 · v2 · submitted 2024-06-12 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu , Fengqing Jiang , Luyao Niu , Yuntian Deng , Radha Poovendran , Yejin Choi , Bill Yuchen Lin

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:54 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords alignment data synthesisinstruction tuningsynthetic dataLLM alignmentself-synthesisLlama-3chat templatesdata scaling

0 comments

The pith

Prompting aligned LLMs like Llama-3-Instruct with only left-side conversation templates produces millions of realistic user queries and responses for alignment training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that aligned models can generate their own high-quality instruction data by receiving only the prefix of a chat template up to the user slot. This auto-regressive behavior lets the model complete a plausible user query and then respond to it, yielding four million synthetic pairs without human-written seeds or narrow prompt scopes. After selecting three hundred thousand high-quality instances, fine-tuning a base model on this data reaches performance levels comparable to the official Llama-3-8B-Instruct on several benchmarks, even though the official model used ten million points plus preference optimization. A reader would care because the approach removes the need for expensive human curation or private datasets, opening a route to large-scale open alignment data.

Core claim

By feeding aligned LLMs only the left-side templates up to the position reserved for user messages, the models generate realistic user queries and then provide responses to them. This process extracts four million instructions and responses from Llama-3-Instruct; after filtering, three hundred thousand high-quality examples are retained. Fine-tuning Llama-3-8B-Base on these examples produces models that match the official Llama-3-8B-Instruct on some tasks and outperform earlier public datasets even when used solely for supervised fine-tuning, as measured on AlpacaEval, ArenaHard, and WildBench.

What carries the argument

Magpie synthesis: supplying an aligned LLM with only the system prompt and assistant prefix in a chat template so that auto-regression fills in a user query followed by its own response.

If this is right

Models trained only on Magpie data for supervised fine-tuning can equal or exceed the official Llama-3-8B-Instruct on selected alignment benchmarks despite the official model using ten million points plus feedback learning.
A single round of Magpie data alone surpasses prior public instruction sets that combined supervised fine-tuning with direct preference optimization on UltraFeedback.
The method scales to four million generated examples from one aligned model with no additional human prompting effort.
High-quality subsets of the synthetic data suffice for competitive downstream performance after standard filtering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same left-side prompting could be iterated on the newly aligned model itself to create successive rounds of self-generated data.
Because the approach relies only on an already-aligned open model, it could be applied to other base models or languages to produce domain-specific alignment sets without new human effort.
The gap between generated data volume and retained high-quality subset suggests that automatic quality filters remain a critical control point for scaling.

Load-bearing premise

The queries the model invents from partial templates are diverse enough and close enough to real user needs to produce effective alignment after filtering.

What would settle it

Fine-tuning a base model on the filtered Magpie data and observing substantially lower scores than the official Llama-3-8B-Instruct across AlpacaEval, ArenaHard, and WildBench would falsify the central performance claim.

read the original abstract

High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Magpie's template trick lets you pull millions of plausible user queries straight from an aligned model, and the 300k filtered set gets the base Llama-3 close to the official instruct version on standard evals.

read the letter

The main thing to know is that this works by feeding Llama-3-Instruct only the left-side template up to the user message slot, then letting the model complete a realistic query from its own prior. They scale that to 4 million pairs, filter to 300k, and show the resulting SFT model matches or approaches the official instruct checkpoint on AlpacaEval, ArenaHard, and WildBench despite the official version using far more data including preference tuning.

Referee Report

3 major / 2 minor

Summary. The paper introduces Magpie, a method to synthesize large-scale alignment data by prompting aligned LLMs such as Llama-3-Instruct with only left-side templates to generate user queries via their autoregressive prior. This produces 4 million instruction-response pairs from which 300K high-quality instances are selected; fine-tuning Llama-3-8B-Base on this subset yields models that match or exceed the official Llama-3-8B-Instruct on alignment benchmarks (AlpacaEval, ArenaHard, WildBench) despite using far less data than the 10M-point SFT+preference pipeline.

Significance. If the central empirical claim holds under scrutiny, the work provides a practical, open route to high-quality instruction data that could reduce dependence on proprietary alignment corpora and human curation, enabling more reproducible SFT for open models. The approach leverages an existing aligned model's prior rather than external prompts, which is a clean technical contribution if the generated queries prove sufficiently diverse and the filtering is transparent.

major comments (3)

[§3] §3 (Data Generation and Filtering): The selection of the 300K high-quality subset is described only at a high level; explicit quality criteria, thresholds, or scoring functions are not provided, which is load-bearing because the performance parity claim rests on this curation step rather than raw volume.
[§4] §4 (Experiments): No exact benchmark scores, confidence intervals, or statistical significance tests are reported for the Magpie-tuned models versus Llama-3-8B-Instruct or prior datasets; without these, it is impossible to evaluate whether the observed comparability is robust or sensitive to the 300K selection.
[§4.3] §4.3 (Contamination and Diversity): The paper does not report checks for data contamination between the Magpie-generated set and the evaluation benchmarks, nor quantitative measures (e.g., embedding diversity, n-gram overlap with real user logs) confirming that the left-template-elicited queries are representative of actual user distributions.

minor comments (2)

[Abstract] The abstract and §4 refer to 'some tasks' where Magpie matches the official model; a table or figure explicitly listing per-benchmark deltas would improve clarity.
[§3] Notation for the left-side template construction could be formalized (e.g., as a prompt prefix function) to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to enhance transparency, provide additional quantitative details, and strengthen the experimental reporting.

read point-by-point responses

Referee: [§3] §3 (Data Generation and Filtering): The selection of the 300K high-quality subset is described only at a high level; explicit quality criteria, thresholds, or scoring functions are not provided, which is load-bearing because the performance parity claim rests on this curation step rather than raw volume.

Authors: We agree that the filtering process requires more explicit description. The 300K subset was obtained via a multi-stage pipeline: (1) automatic filters on query length (50-500 tokens), perplexity under Llama-3-8B-Base, and removal of duplicates via MinHash; (2) a quality scorer combining response coherence (via self-consistency checks) and alignment proxies (helpfulness/harmlessness scores from an auxiliary reward model); and (3) diversity-aware sampling via k-means clustering on embeddings. We will expand §3 with the precise thresholds, scoring formula, and pseudocode for reproducibility. revision: yes
Referee: [§4] §4 (Experiments): No exact benchmark scores, confidence intervals, or statistical significance tests are reported for the Magpie-tuned models versus Llama-3-8B-Instruct or prior datasets; without these, it is impossible to evaluate whether the observed comparability is robust or sensitive to the 300K selection.

Authors: The submitted manuscript contains comparative tables, but we acknowledge the absence of exact numerical values with uncertainty estimates and formal tests. In the revision we will report the precise win rates / scores on AlpacaEval, ArenaHard, and WildBench for all models, include 95% confidence intervals obtained via bootstrap resampling over 1000 iterations, and add paired statistical significance tests (e.g., bootstrap p-values) comparing Magpie-tuned models against Llama-3-8B-Instruct and prior datasets. This will clarify robustness with respect to the 300K selection. revision: yes
Referee: [§4.3] §4.3 (Contamination and Diversity): The paper does not report checks for data contamination between the Magpie-generated set and the evaluation benchmarks, nor quantitative measures (e.g., embedding diversity, n-gram overlap with real user logs) confirming that the left-template-elicited queries are representative of actual user distributions.

Authors: We did not include contamination or quantitative diversity analyses in the initial submission. We will add these in the revision: (1) contamination checks via 5-gram overlap and cosine similarity of sentence embeddings between the Magpie set and the test splits of AlpacaEval, ArenaHard, and WildBench; (2) diversity metrics including average pairwise embedding distance (using Llama-3 embeddings) and n-gram overlap statistics against public user-query corpora such as ShareGPT and LMSYS-Chat. These additions will demonstrate both low contamination risk and representativeness of the generated queries. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is empirical and externally benchmarked

full rationale

The paper describes an empirical pipeline: prompt Llama-3-Instruct with left-side templates to auto-regressively generate user queries, pair them with model responses, filter to 300K instances, fine-tune Llama-3-8B-Base, and evaluate on independent external benchmarks (AlpacaEval, ArenaHard, WildBench). No equation or step reduces by construction to its own inputs; no parameter is fitted on a subset and then renamed as a prediction; no load-bearing claim rests on a self-citation chain or imported uniqueness theorem. The central result—that the synthesized data yields competitive alignment—is presented as an observable outcome of the procedure rather than a definitional tautology, making the argument self-contained against external metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that aligned LLMs will produce useful queries when given partial templates, plus an unspecified selection process for the final 300K examples.

free parameters (1)

quality selection criteria
The rules or thresholds used to reduce 4 million generated pairs to 300K high-quality instances are not specified.

axioms (1)

domain assumption Aligned LLMs can generate coherent and useful user queries when prompted with only the left-side conversation templates.
This is the central observation stated in the abstract that enables the entire synthesis pipeline.

pith-pipeline@v0.9.0 · 5649 in / 1322 out tokens · 55373 ms · 2026-05-16T06:54:57.162248+00:00 · methodology

discussion (0)

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Language Diffusion Models
cs.CL 2025-02 unverdicted novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
cs.LG 2025-02 unverdicted novelty 7.0

A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
cs.LG 2026-05 unverdicted novelty 6.0

NodeSynth generates evidence-anchored synthetic queries that trigger up to five times higher failure rates in mainstream LLMs than human-authored benchmarks.
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
cs.AI 2026-05 unverdicted novelty 6.0

MORA breaks the safety-helpfulness trade-off in LLM alignment by pre-sampling single-reward prompts and rewriting them to expand multi-dimensional reward diversity, yielding 5-12.4% single-preference gains in sequenti...
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion
cs.AI 2026-05 unverdicted novelty 6.0

MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall i...
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
cs.CL 2026-05 unverdicted novelty 6.0

PARD-2 uses Confidence-Adaptive Token optimization to align draft model training with acceptance length in speculative decoding, enabling dual-mode operation and up to 6.94x lossless speedup on Llama3.1-8B.
Hypothesis generation and updating in large language models
cs.LG 2026-05 unverdicted novelty 6.0

LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
ZAYA1-8B Technical Report
cs.AI 2026-05 unverdicted novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
cs.CR 2026-04 unverdicted novelty 6.0

TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.
To Adapt or not to Adapt, Rethinking the Value of Medical Knowledge-Aware Large Language Models
cs.CL 2026-04 unverdicted novelty 6.0

Domain-adapted clinical LLMs provide only marginal and unstable gains over general models on English clinical MCQA benchmarks, while new Spanish Marmoka models perform better.
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
cs.DC 2026-02 unverdicted novelty 6.0

SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.
Multi-Token Prediction via Self-Distillation
cs.CL 2026-02 unverdicted novelty 6.0

Self-distillation turns pretrained autoregressive LMs into multi-token predictors that decode over 3x faster with under 5% accuracy drop on GSM8K.
SmolVLM: Redefining small and efficient multimodal models
cs.AI 2025-04 unverdicted novelty 6.0

SmolVLM-256M outperforms a 300-times larger model using under 1 GB GPU memory, while the 2.2B version matches state-of-the-art VLMs at half the memory cost.
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
cs.CV 2024-12 unverdicted novelty 6.0

InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
cs.CL 2026-05 unverdicted novelty 5.0

LoPT delivers competitive LLM post-training results by training only the top half on the task objective and using feature reconstruction to update the bottom half.
Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training
cs.CL 2026-05 unverdicted novelty 5.0

LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.
Kimi-Audio Technical Report
eess.AS 2025-04 unverdicted novelty 5.0

Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million ho...
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
cs.CL 2025-02 unverdicted novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
cs.CV 2024-12 accept novelty 5.0

DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B a...
LLaVA-OneVision: Easy Visual Task Transfer
cs.CV 2024-08 unverdicted novelty 5.0

LLaVA-OneVision is the first single open LMM to simultaneously achieve strong performance in single-image, multi-image, and video scenarios with cross-scenario transfer capabilities.
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
cs.CV 2025-01 unverdicted novelty 4.0

VideoLLaMA3 uses a vision-centric training paradigm and token-reduction design to reach competitive results on image and video benchmarks.

Reference graph

Works this paper leans on

126 extracted references · 126 canonical work pages · cited by 19 Pith papers · 21 internal anchors

[5]

and Stoica, Ion and Xing, Eric P

Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , month =. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90\ url =

work page
[6]

EMNLP , year=

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts , author=. EMNLP , year=

work page
[8]

Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

What does it mean for a language model to preserve privacy? , author=. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

work page 2022
[9]

2018 , publisher=

Improving language understanding by generative pre-training , author=. 2018 , publisher=

work page 2018
[11]

International Conference on Machine Learning , pages=

Bag of tricks for training data extraction from language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[14]

Advances in Neural Information Processing Systems , volume=

Emergent and predictable memorization in large language models , author=. Advances in Neural Information Processing Systems , volume=

work page
[15]

2024 , eprint=

MAmmoTH2: Scaling Instructions from the Web , author=. 2024 , eprint=

work page 2024
[17]

ArXiv , year=

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning , author=. ArXiv , year=

work page
[20]

International Conference on Learning Representations , year=

Thieves on Sesame Street! Model Extraction of BERT-based APIs , author=. International Conference on Learning Representations , year=

work page
[21]

30th USENIX Security Symposium (USENIX Security 21) , pages=

Extracting training data from large language models , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=

work page
[22]

Hashimoto , title =

Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =

work page 2023
[23]

Advances in Neural Information Processing Systems , volume=

Lima: Less is more for alignment , author=. Advances in Neural Information Processing Systems , volume=

work page
[25]

Gonzalez and Ion Stoica , booktitle=

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. Judging. 2023 , url=

work page 2023
[26]

Gonzalez and Ion Stoica , month =

Tianle Li and Wei-Lin Chiang and Evan Frick and Lisa Dunlap and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica , month =. From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline , url =

work page
[28]

International Conference on Artificial Intelligence and Statistics , pages=

A general theoretical paradigm to understand learning from human preferences , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

work page 2024
[32]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[34]

, author=

Visualizing data using t-SNE. , author=. Journal of machine learning research , volume=

work page
[35]

Hashimoto , title =

Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =

work page 2023
[39]

WildChat: 1M Chat

Wenting Zhao and Xiang Ren and Jack Hessel and Claire Cardie and Yejin Choi and Yuntian Deng , booktitle=. WildChat: 1M Chat. 2024 , url=

work page 2024
[40]

Gonzalez and Ion Stoica and Hao Zhang , booktitle=

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Tianle Li and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zhuohan Li and Zi Lin and Eric Xing and Joseph E. Gonzalez and Ion Stoica and Hao Zhang , booktitle=. 2024 , url=

work page 2024
[45]

2023 , publisher =

OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants , author =. 2023 , publisher =

work page 2023
[46]

and Khashabi, Daniel and Hajishirzi, Hannaneh

Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023

work page 2023
[47]

Advances in Neural Information Processing Systems , volume=

Camel: Communicative agents for" mind" exploration of large language model society , author=. Advances in Neural Information Processing Systems , volume=

work page
[49]

The Twelfth International Conference on Learning Representations , year=

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning , author=. The Twelfth International Conference on Learning Representations , year=

work page
[50]

2022 , eprint=

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , author=. 2022 , eprint=

work page 2022
[51]

International Conference on Learning Representations , year=

Finetuned Language Models are Zero-Shot Learners , author=. International Conference on Learning Representations , year=

work page
[52]

arXiv preprint arXiv:2309.15025 , year=

Large language model alignment: A survey , author=. arXiv preprint arXiv:2309.15025 , year=

work page arXiv
[56]

WildBench: Benchmarking Language Models with Challenging Tasks from Real Users in the Wild , author =

work page
[60]

Thirteenth international conference on the principles of knowledge representation and reasoning , year=

The winograd schema challenge , author=. Thirteenth international conference on the principles of knowledge representation and reasoning , year=

work page
[62]

2023 , publisher =

Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf , title =. 2023 , publisher =

work page 2023
[63]

Introducing the next generation of Claude , author=

work page
[64]

Our next-generation model: Gemini 1.5 , author=

work page
[65]

2023 , publisher =

OpenHermes Dataset , author =. 2023 , publisher =

work page 2023
[66]

2023 , publisher =

Databricks Dolly-15k , author =. 2023 , publisher =

work page 2023
[67]

2024 , eprint=

The Faiss library , author=. 2024 , eprint=

work page 2024
[68]

Zephyr: Direct Distillation of LM Alignment

Zephyr: Direct distillation of lm alignment , author=. arXiv preprint arXiv:2310.16944 , year=

work page internal anchor Pith review arXiv
[69]

Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

work page
[70]

Advances in Neural Information Processing Systems , volume=

Principle-driven self-alignment of language models from scratch with minimal human supervision , author=. Advances in Neural Information Processing Systems , volume=

work page
[72]

2024 , eprint=

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint , author=. 2024 , eprint=

work page 2024
[75]

International Conference on Learning Representations , year=

Multitask Prompted Training Enables Zero-Shot Task Generalization , author=. International Conference on Learning Representations , year=

work page
[76]

OpenAssistant Conversations - Democratizing Large Language Model Alignment , url =

K\". OpenAssistant Conversations - Democratizing Large Language Model Alignment , url =. Advances in Neural Information Processing Systems , editor =

work page
[80]

Tran, Hoang and Glaze, Chris and Hancock, Braden , title =

work page
[81]

Forty-first International Conference on Machine Learning , year=

TrustLLM: Trustworthiness in Large Language Models , author=. Forty-first International Conference on Machine Learning , year=

work page
[84]

Llama Team , title =

work page
[85]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[86]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[87]

A general theoretical paradigm to understand learning from human preferences

Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. A general theoretical paradigm to understand learning from human preferences. In International Conference on Artificial Intelligence and Statistics, pp.\ 4447--4455. PMLR, 2024

work page 2024
[88]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[89]

Special characters attack: Toward scalable training data extraction from large language models

Yang Bai, Ge Pei, Jindong Gu, Yong Yang, and Xingjun Ma. Special characters attack: Toward scalable training data extraction from large language models. arXiv preprint arXiv:2405.05990, 2024

work page arXiv 2024
[90]

Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

work page 2022
[91]

Open llm leaderboard

Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, and Thomas Wolf. Open llm leaderboard. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard, 2023

work page 2023
[92]

Emergent and predictable memorization in large language models

Stella Biderman, Usvsn Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, and Edward Raff. Emergent and predictable memorization in large language models. Advances in Neural Information Processing Systems, 36, 2023

work page 2023
[93]

What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.\ 2280--2292, 2022

Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tram \`e r. What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.\ 2280--2292, 2022

work page 2022
[94]

Extracting training data from large language models

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.\ 2633--2650, 2021

work page 2021
[95]

GenQA: Generat- ing millions of instructions from a handful of prompts,

Jiuhai Chen, Rifaa Qadri, Yuxin Wen, Neel Jain, John Kirchenbauer, Tianyi Zhou, and Tom Goldstein. Genqa: Generating millions of instructions from a handful of prompts. arXiv preprint arXiv:2406.10323, 2024

work page arXiv 2024
[96]

Alpagasus: Training a better alpaca with fewer data

Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, et al. Alpagasus: Training a better alpaca with fewer data. arXiv preprint arXiv:2307.08701, 2023

work page arXiv 2023
[97]

Gonzalez, Ion Stoica, and Eric P

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna: An open-source chatbot impressing gpt-4 with 90\ URL https://lmsys.org/blog/2023-03-30-vicuna/

work page 2023
[98]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[99]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[100]

UltraFeedback : Boosting Language Models with High-quality Feedback

Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, and Maosong Sun. Ultrafeedback: Boosting language models with high-quality feedback. arXiv preprint arXiv:2310.01377, 2023

work page arXiv 2023
[101]

Databricks dolly-15k, 2023

Databricks. Databricks dolly-15k, 2023. URL https://huggingface.co/datasets/databricks/databricks-dolly-15k

work page 2023
[102]

On the limitations of reference-free evaluations of generated text

Daniel Deutsch, Rotem Dror, and Dan Roth. On the limitations of reference-free evaluations of generated text. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 10960--10977, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Lin...

work page doi:10.18653/v1/2022.emnlp-main.753 2022
[103]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[104]

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[105]

The faiss library

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library. 2024

work page 2024
[106]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[107]

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

Yann Dubois, Bal \'a zs Galambosi, Percy Liang, and Tatsunori B Hashimoto. Length-controlled alpacaeval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[108]

KTO: Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. Kto: Model alignment as prospect theoretic optimization. arXiv preprint arXiv:2402.01306, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[109]

Better synthetic data by retrieving and transforming existing datasets

Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, and Graham Neubig. Better synthetic data by retrieving and transforming existing datasets. arXiv preprint arXiv:2404.14361, 2024

work page arXiv 2024
[110]

Are we done with mmlu? arXiv preprint arXiv:2406.04127, 2024

Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, et al. Are we done with mmlu? arXiv preprint arXiv:2406.04127, 2024

work page arXiv 2024
[111]

Measuring Massive Multitask Language Understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[112]

ORPO: Monolithic Preference Optimization without Reference Model

Jiwoo Hong, Noah Lee, and James Thorne. Reference-free monolithic preference optimization with odds ratio. arXiv preprint arXiv:2403.07691, 2024

work page internal anchor Pith review arXiv 2024
[113]

Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Hanchi Sun, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric P. Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Mano...

work page 2024
[114]

Camels in a changing climate: Enhancing lm adaptation with tulu 2

Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A Smith, Iz Beltagy, et al. Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv preprint arXiv:2311.10702, 2023

work page arXiv 2023
[115]

Alpaca against vicuna: Using llms to uncover memorization of llms

Aly M Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, and Santu Rana. Alpaca against vicuna: Using llms to uncover memorization of llms. arXiv preprint arXiv:2403.04801, 2024

work page arXiv 2024
[116]

o pf, Yannic Kilcher, Dimitri von R\

Andreas K\" o pf, Yannic Kilcher, Dimitri von R\" u tte, Sotiris Anagnostidis, Zhi Rui Tam, Keith Stevens, Abdullah Barhoum, Duc Nguyen, Oliver Stanley, Rich\' a rd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, and Alexander Mattick. Openassistant conversations - democratizing large langu...

work page 2023
[117]

Parikh, Nicolas Papernot, and Mohit Iyyer

Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on sesame street! model extraction of bert-based apis. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Byl5NREFDr

work page 2020
[118]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

work page 2023
[119]

Rewardbench: Evaluating reward models for language modeling

Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, et al. Rewardbench: Evaluating reward models for language modeling. arXiv preprint arXiv:2403.13787, 2024

work page arXiv 2024
[120]

The winograd schema challenge

Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning, 2012

work page 2012

Showing first 80 references.