pith. machine review for the scientific record. sign in

arxiv: 2406.08464 · v2 · submitted 2024-06-12 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:54 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords alignment data synthesisinstruction tuningsynthetic dataLLM alignmentself-synthesisLlama-3chat templatesdata scaling
0
0 comments X

The pith

Prompting aligned LLMs like Llama-3-Instruct with only left-side conversation templates produces millions of realistic user queries and responses for alignment training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that aligned models can generate their own high-quality instruction data by receiving only the prefix of a chat template up to the user slot. This auto-regressive behavior lets the model complete a plausible user query and then respond to it, yielding four million synthetic pairs without human-written seeds or narrow prompt scopes. After selecting three hundred thousand high-quality instances, fine-tuning a base model on this data reaches performance levels comparable to the official Llama-3-8B-Instruct on several benchmarks, even though the official model used ten million points plus preference optimization. A reader would care because the approach removes the need for expensive human curation or private datasets, opening a route to large-scale open alignment data.

Core claim

By feeding aligned LLMs only the left-side templates up to the position reserved for user messages, the models generate realistic user queries and then provide responses to them. This process extracts four million instructions and responses from Llama-3-Instruct; after filtering, three hundred thousand high-quality examples are retained. Fine-tuning Llama-3-8B-Base on these examples produces models that match the official Llama-3-8B-Instruct on some tasks and outperform earlier public datasets even when used solely for supervised fine-tuning, as measured on AlpacaEval, ArenaHard, and WildBench.

What carries the argument

Magpie synthesis: supplying an aligned LLM with only the system prompt and assistant prefix in a chat template so that auto-regression fills in a user query followed by its own response.

If this is right

  • Models trained only on Magpie data for supervised fine-tuning can equal or exceed the official Llama-3-8B-Instruct on selected alignment benchmarks despite the official model using ten million points plus feedback learning.
  • A single round of Magpie data alone surpasses prior public instruction sets that combined supervised fine-tuning with direct preference optimization on UltraFeedback.
  • The method scales to four million generated examples from one aligned model with no additional human prompting effort.
  • High-quality subsets of the synthetic data suffice for competitive downstream performance after standard filtering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same left-side prompting could be iterated on the newly aligned model itself to create successive rounds of self-generated data.
  • Because the approach relies only on an already-aligned open model, it could be applied to other base models or languages to produce domain-specific alignment sets without new human effort.
  • The gap between generated data volume and retained high-quality subset suggests that automatic quality filters remain a critical control point for scaling.

Load-bearing premise

The queries the model invents from partial templates are diverse enough and close enough to real user needs to produce effective alignment after filtering.

What would settle it

Fine-tuning a base model on the filtered Magpie data and observing substantially lower scores than the official Llama-3-8B-Instruct across AlpacaEval, ArenaHard, and WildBench would falsify the central performance claim.

read the original abstract

High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Magpie, a method to synthesize large-scale alignment data by prompting aligned LLMs such as Llama-3-Instruct with only left-side templates to generate user queries via their autoregressive prior. This produces 4 million instruction-response pairs from which 300K high-quality instances are selected; fine-tuning Llama-3-8B-Base on this subset yields models that match or exceed the official Llama-3-8B-Instruct on alignment benchmarks (AlpacaEval, ArenaHard, WildBench) despite using far less data than the 10M-point SFT+preference pipeline.

Significance. If the central empirical claim holds under scrutiny, the work provides a practical, open route to high-quality instruction data that could reduce dependence on proprietary alignment corpora and human curation, enabling more reproducible SFT for open models. The approach leverages an existing aligned model's prior rather than external prompts, which is a clean technical contribution if the generated queries prove sufficiently diverse and the filtering is transparent.

major comments (3)
  1. [§3] §3 (Data Generation and Filtering): The selection of the 300K high-quality subset is described only at a high level; explicit quality criteria, thresholds, or scoring functions are not provided, which is load-bearing because the performance parity claim rests on this curation step rather than raw volume.
  2. [§4] §4 (Experiments): No exact benchmark scores, confidence intervals, or statistical significance tests are reported for the Magpie-tuned models versus Llama-3-8B-Instruct or prior datasets; without these, it is impossible to evaluate whether the observed comparability is robust or sensitive to the 300K selection.
  3. [§4.3] §4.3 (Contamination and Diversity): The paper does not report checks for data contamination between the Magpie-generated set and the evaluation benchmarks, nor quantitative measures (e.g., embedding diversity, n-gram overlap with real user logs) confirming that the left-template-elicited queries are representative of actual user distributions.
minor comments (2)
  1. [Abstract] The abstract and §4 refer to 'some tasks' where Magpie matches the official model; a table or figure explicitly listing per-benchmark deltas would improve clarity.
  2. [§3] Notation for the left-side template construction could be formalized (e.g., as a prompt prefix function) to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to enhance transparency, provide additional quantitative details, and strengthen the experimental reporting.

read point-by-point responses
  1. Referee: [§3] §3 (Data Generation and Filtering): The selection of the 300K high-quality subset is described only at a high level; explicit quality criteria, thresholds, or scoring functions are not provided, which is load-bearing because the performance parity claim rests on this curation step rather than raw volume.

    Authors: We agree that the filtering process requires more explicit description. The 300K subset was obtained via a multi-stage pipeline: (1) automatic filters on query length (50-500 tokens), perplexity under Llama-3-8B-Base, and removal of duplicates via MinHash; (2) a quality scorer combining response coherence (via self-consistency checks) and alignment proxies (helpfulness/harmlessness scores from an auxiliary reward model); and (3) diversity-aware sampling via k-means clustering on embeddings. We will expand §3 with the precise thresholds, scoring formula, and pseudocode for reproducibility. revision: yes

  2. Referee: [§4] §4 (Experiments): No exact benchmark scores, confidence intervals, or statistical significance tests are reported for the Magpie-tuned models versus Llama-3-8B-Instruct or prior datasets; without these, it is impossible to evaluate whether the observed comparability is robust or sensitive to the 300K selection.

    Authors: The submitted manuscript contains comparative tables, but we acknowledge the absence of exact numerical values with uncertainty estimates and formal tests. In the revision we will report the precise win rates / scores on AlpacaEval, ArenaHard, and WildBench for all models, include 95% confidence intervals obtained via bootstrap resampling over 1000 iterations, and add paired statistical significance tests (e.g., bootstrap p-values) comparing Magpie-tuned models against Llama-3-8B-Instruct and prior datasets. This will clarify robustness with respect to the 300K selection. revision: yes

  3. Referee: [§4.3] §4.3 (Contamination and Diversity): The paper does not report checks for data contamination between the Magpie-generated set and the evaluation benchmarks, nor quantitative measures (e.g., embedding diversity, n-gram overlap with real user logs) confirming that the left-template-elicited queries are representative of actual user distributions.

    Authors: We did not include contamination or quantitative diversity analyses in the initial submission. We will add these in the revision: (1) contamination checks via 5-gram overlap and cosine similarity of sentence embeddings between the Magpie set and the test splits of AlpacaEval, ArenaHard, and WildBench; (2) diversity metrics including average pairwise embedding distance (using Llama-3 embeddings) and n-gram overlap statistics against public user-query corpora such as ShareGPT and LMSYS-Chat. These additions will demonstrate both low contamination risk and representativeness of the generated queries. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is empirical and externally benchmarked

full rationale

The paper describes an empirical pipeline: prompt Llama-3-Instruct with left-side templates to auto-regressively generate user queries, pair them with model responses, filter to 300K instances, fine-tune Llama-3-8B-Base, and evaluate on independent external benchmarks (AlpacaEval, ArenaHard, WildBench). No equation or step reduces by construction to its own inputs; no parameter is fitted on a subset and then renamed as a prediction; no load-bearing claim rests on a self-citation chain or imported uniqueness theorem. The central result—that the synthesized data yields competitive alignment—is presented as an observable outcome of the procedure rather than a definitional tautology, making the argument self-contained against external metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the domain assumption that aligned LLMs will produce useful queries when given partial templates, plus an unspecified selection process for the final 300K examples.

free parameters (1)
  • quality selection criteria
    The rules or thresholds used to reduce 4 million generated pairs to 300K high-quality instances are not specified.
axioms (1)
  • domain assumption Aligned LLMs can generate coherent and useful user queries when prompted with only the left-side conversation templates.
    This is the central observation stated in the abstract that enables the entire synthesis pipeline.

pith-pipeline@v0.9.0 · 5649 in / 1322 out tokens · 55373 ms · 2026-05-16T06:54:57.162248+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Large Language Diffusion Models

    cs.CL 2025-02 unverdicted novelty 8.0

    LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

  2. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

    cs.LG 2025-02 unverdicted novelty 7.0

    A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.

  3. NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

    cs.LG 2026-05 unverdicted novelty 6.0

    NodeSynth generates evidence-anchored synthetic queries that trigger up to five times higher failure rates in mainstream LLMs than human-authored benchmarks.

  4. Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion

    cs.AI 2026-05 unverdicted novelty 6.0

    MORA breaks the safety-helpfulness trade-off in LLM alignment by pre-sampling single-reward prompts and rewriting them to expand multi-dimensional reward diversity, yielding 5-12.4% single-preference gains in sequenti...

  5. Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion

    cs.AI 2026-05 unverdicted novelty 6.0

    MORA breaks the safety-helpfulness ceiling in LLMs by pre-sampling single-reward prompts and rewriting them to incorporate multi-dimensional intents, delivering 5-12.4% gains in sequential alignment and 4.6% overall i...

  6. PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

    cs.CL 2026-05 unverdicted novelty 6.0

    PARD-2 uses Confidence-Adaptive Token optimization to align draft model training with acceptance length in speculative decoding, enabling dual-mode operation and up to 6.94x lossless speedup on Llama3.1-8B.

  7. Hypothesis generation and updating in large language models

    cs.LG 2026-05 unverdicted novelty 6.0

    LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.

  8. ZAYA1-8B Technical Report

    cs.AI 2026-05 unverdicted novelty 6.0

    ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

  9. TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning

    cs.CR 2026-04 unverdicted novelty 6.0

    TwinGate deploys a stateful dual-encoder system with asymmetric contrastive learning to detect decompositional jailbreaks in untraceable LLM traffic at high recall and low false-positive rate with negligible latency.

  10. To Adapt or not to Adapt, Rethinking the Value of Medical Knowledge-Aware Large Language Models

    cs.CL 2026-04 unverdicted novelty 6.0

    Domain-adapted clinical LLMs provide only marginal and unstable gains over general models on English clinical MCQA benchmarks, while new Spanish Marmoka models perform better.

  11. SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

    cs.DC 2026-02 unverdicted novelty 6.0

    SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.

  12. Multi-Token Prediction via Self-Distillation

    cs.CL 2026-02 unverdicted novelty 6.0

    Self-distillation turns pretrained autoregressive LMs into multi-token predictors that decode over 3x faster with under 5% accuracy drop on GSM8K.

  13. SmolVLM: Redefining small and efficient multimodal models

    cs.AI 2025-04 unverdicted novelty 6.0

    SmolVLM-256M outperforms a 300-times larger model using under 1 GB GPU memory, while the 2.2B version matches state-of-the-art VLMs at half the memory cost.

  14. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

    cs.CV 2024-12 unverdicted novelty 6.0

    InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.

  15. Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

    cs.CL 2026-05 unverdicted novelty 5.0

    LoPT delivers competitive LLM post-training results by training only the top half on the task objective and using feature reconstruction to update the bottom half.

  16. Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

    cs.CL 2026-05 unverdicted novelty 5.0

    LoPT achieves competitive task performance in LLM post-training by limiting task gradients to the upper model half and training the lower half with local feature reconstruction.

  17. Kimi-Audio Technical Report

    eess.AS 2025-04 unverdicted novelty 5.0

    Kimi-Audio is an open-source audio foundation model that achieves state-of-the-art results on speech recognition, audio understanding, question answering, and conversation after pre-training on more than 13 million ho...

  18. SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

    cs.CL 2025-02 unverdicted novelty 5.0

    SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

  19. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

    cs.CV 2024-12 accept novelty 5.0

    DeepSeek-VL2 is a series of MoE vision-language models using dynamic tiling and latent attention that reach competitive or state-of-the-art results on VQA, OCR, document understanding and grounding with 1.0B to 4.5B a...

  20. LLaVA-OneVision: Easy Visual Task Transfer

    cs.CV 2024-08 unverdicted novelty 5.0

    LLaVA-OneVision is the first single open LMM to simultaneously achieve strong performance in single-image, multi-image, and video scenarios with cross-scenario transfer capabilities.

  21. VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

    cs.CV 2025-01 unverdicted novelty 4.0

    VideoLLaMA3 uses a vision-centric training paradigm and token-reduction design to reach competitive results on image and video benchmarks.

Reference graph

Works this paper leans on

126 extracted references · 126 canonical work pages · cited by 19 Pith papers · 21 internal anchors

  1. [5]

    and Stoica, Ion and Xing, Eric P

    Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P. , month =. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90\ url =

  2. [6]

    EMNLP , year=

    Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts , author=. EMNLP , year=

  3. [8]

    Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

    What does it mean for a language model to preserve privacy? , author=. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

  4. [9]

    2018 , publisher=

    Improving language understanding by generative pre-training , author=. 2018 , publisher=

  5. [11]

    International Conference on Machine Learning , pages=

    Bag of tricks for training data extraction from language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  6. [14]

    Advances in Neural Information Processing Systems , volume=

    Emergent and predictable memorization in large language models , author=. Advances in Neural Information Processing Systems , volume=

  7. [15]

    2024 , eprint=

    MAmmoTH2: Scaling Instructions from the Web , author=. 2024 , eprint=

  8. [17]

    ArXiv , year=

    MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning , author=. ArXiv , year=

  9. [20]

    International Conference on Learning Representations , year=

    Thieves on Sesame Street! Model Extraction of BERT-based APIs , author=. International Conference on Learning Representations , year=

  10. [21]

    30th USENIX Security Symposium (USENIX Security 21) , pages=

    Extracting training data from large language models , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=

  11. [22]

    Hashimoto , title =

    Xuechen Li and Tianyi Zhang and Yann Dubois and Rohan Taori and Ishaan Gulrajani and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =

  12. [23]

    Advances in Neural Information Processing Systems , volume=

    Lima: Less is more for alignment , author=. Advances in Neural Information Processing Systems , volume=

  13. [25]

    Gonzalez and Ion Stoica , booktitle=

    Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. Judging. 2023 , url=

  14. [26]

    Gonzalez and Ion Stoica , month =

    Tianle Li and Wei-Lin Chiang and Evan Frick and Lisa Dunlap and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica , month =. From Live Data to High-Quality Benchmarks: The Arena-Hard Pipeline , url =

  15. [28]

    International Conference on Artificial Intelligence and Statistics , pages=

    A general theoretical paradigm to understand learning from human preferences , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

  16. [32]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  17. [34]

    , author=

    Visualizing data using t-SNE. , author=. Journal of machine learning research , volume=

  18. [35]

    Hashimoto , title =

    Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto , title =. GitHub repository , howpublished =. 2023 , publisher =

  19. [39]

    WildChat: 1M Chat

    Wenting Zhao and Xiang Ren and Jack Hessel and Claire Cardie and Yejin Choi and Yuntian Deng , booktitle=. WildChat: 1M Chat. 2024 , url=

  20. [40]

    Gonzalez and Ion Stoica and Hao Zhang , booktitle=

    Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Tianle Li and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zhuohan Li and Zi Lin and Eric Xing and Joseph E. Gonzalez and Ion Stoica and Hao Zhang , booktitle=. 2024 , url=

  21. [45]

    2023 , publisher =

    OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants , author =. 2023 , publisher =

  22. [46]

    and Khashabi, Daniel and Hajishirzi, Hannaneh

    Wang, Yizhong and Kordi, Yeganeh and Mishra, Swaroop and Liu, Alisa and Smith, Noah A. and Khashabi, Daniel and Hajishirzi, Hannaneh. Self-Instruct: Aligning Language Models with Self-Generated Instructions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023

  23. [47]

    Advances in Neural Information Processing Systems , volume=

    Camel: Communicative agents for" mind" exploration of large language model society , author=. Advances in Neural Information Processing Systems , volume=

  24. [49]

    The Twelfth International Conference on Learning Representations , year=

    What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning , author=. The Twelfth International Conference on Learning Representations , year=

  25. [50]

    2022 , eprint=

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , author=. 2022 , eprint=

  26. [51]

    International Conference on Learning Representations , year=

    Finetuned Language Models are Zero-Shot Learners , author=. International Conference on Learning Representations , year=

  27. [52]

    arXiv preprint arXiv:2309.15025 , year=

    Large language model alignment: A survey , author=. arXiv preprint arXiv:2309.15025 , year=

  28. [56]

    WildBench: Benchmarking Language Models with Challenging Tasks from Real Users in the Wild , author =

  29. [60]

    Thirteenth international conference on the principles of knowledge representation and reasoning , year=

    The winograd schema challenge , author=. Thirteenth international conference on the principles of knowledge representation and reasoning , year=

  30. [62]

    2023 , publisher =

    Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf , title =. 2023 , publisher =

  31. [63]

    Introducing the next generation of Claude , author=

  32. [64]

    Our next-generation model: Gemini 1.5 , author=

  33. [65]

    2023 , publisher =

    OpenHermes Dataset , author =. 2023 , publisher =

  34. [66]

    2023 , publisher =

    Databricks Dolly-15k , author =. 2023 , publisher =

  35. [67]

    2024 , eprint=

    The Faiss library , author=. 2024 , eprint=

  36. [68]

    Zephyr: Direct Distillation of LM Alignment

    Zephyr: Direct distillation of lm alignment , author=. arXiv preprint arXiv:2310.16944 , year=

  37. [69]

    Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

    Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

  38. [70]

    Advances in Neural Information Processing Systems , volume=

    Principle-driven self-alignment of language models from scratch with minimal human supervision , author=. Advances in Neural Information Processing Systems , volume=

  39. [72]

    2024 , eprint=

    Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint , author=. 2024 , eprint=

  40. [75]

    International Conference on Learning Representations , year=

    Multitask Prompted Training Enables Zero-Shot Task Generalization , author=. International Conference on Learning Representations , year=

  41. [76]

    OpenAssistant Conversations - Democratizing Large Language Model Alignment , url =

    K\". OpenAssistant Conversations - Democratizing Large Language Model Alignment , url =. Advances in Neural Information Processing Systems , editor =

  42. [80]

    Tran, Hoang and Glaze, Chris and Hancock, Braden , title =

  43. [81]

    Forty-first International Conference on Machine Learning , year=

    TrustLLM: Trustworthiness in Large Language Models , author=. Forty-first International Conference on Machine Learning , year=

  44. [84]

    Llama Team , title =

  45. [85]

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024

  46. [86]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

  47. [87]

    A general theoretical paradigm to understand learning from human preferences

    Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Remi Munos, Mark Rowland, Michal Valko, and Daniele Calandriello. A general theoretical paradigm to understand learning from human preferences. In International Conference on Artificial Intelligence and Statistics, pp.\ 4447--4455. PMLR, 2024

  48. [88]

    Qwen Technical Report

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

  49. [89]

    Special characters attack: Toward scalable training data extraction from large language models

    Yang Bai, Ge Pei, Jindong Gu, Yong Yang, and Xingjun Ma. Special characters attack: Toward scalable training data extraction from large language models. arXiv preprint arXiv:2405.05990, 2024

  50. [90]

    Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022

    Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

  51. [91]

    Open llm leaderboard

    Edward Beeching, Clémentine Fourrier, Nathan Habib, Sheon Han, Nathan Lambert, Nazneen Rajani, Omar Sanseviero, Lewis Tunstall, and Thomas Wolf. Open llm leaderboard. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard, 2023

  52. [92]

    Emergent and predictable memorization in large language models

    Stella Biderman, Usvsn Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, and Edward Raff. Emergent and predictable memorization in large language models. Advances in Neural Information Processing Systems, 36, 2023

  53. [93]

    What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.\ 2280--2292, 2022

    Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tram \`e r. What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.\ 2280--2292, 2022

  54. [94]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.\ 2633--2650, 2021

  55. [95]

    GenQA: Generat- ing millions of instructions from a handful of prompts,

    Jiuhai Chen, Rifaa Qadri, Yuxin Wen, Neel Jain, John Kirchenbauer, Tianyi Zhou, and Tom Goldstein. Genqa: Generating millions of instructions from a handful of prompts. arXiv preprint arXiv:2406.10323, 2024

  56. [96]

    Alpagasus: Training a better alpaca with fewer data

    Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, et al. Alpagasus: Training a better alpaca with fewer data. arXiv preprint arXiv:2307.08701, 2023

  57. [97]

    Gonzalez, Ion Stoica, and Eric P

    Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna: An open-source chatbot impressing gpt-4 with 90\ URL https://lmsys.org/blog/2023-03-30-vicuna/

  58. [98]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018

  59. [99]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021

  60. [100]

    UltraFeedback : Boosting Language Models with High-quality Feedback

    Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, and Maosong Sun. Ultrafeedback: Boosting language models with high-quality feedback. arXiv preprint arXiv:2310.01377, 2023

  61. [101]

    Databricks dolly-15k, 2023

    Databricks. Databricks dolly-15k, 2023. URL https://huggingface.co/datasets/databricks/databricks-dolly-15k

  62. [102]

    On the limitations of reference-free evaluations of generated text

    Daniel Deutsch, Rotem Dror, and Dan Roth. On the limitations of reference-free evaluations of generated text. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 10960--10977, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Lin...

  63. [103]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018

  64. [104]

    Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

    Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233, 2023

  65. [105]

    The faiss library

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library. 2024

  66. [106]

    The Llama 3 Herd of Models

    Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  67. [107]

    Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

    Yann Dubois, Bal \'a zs Galambosi, Percy Liang, and Tatsunori B Hashimoto. Length-controlled alpacaeval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475, 2024

  68. [108]

    KTO: Model Alignment as Prospect Theoretic Optimization

    Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, and Douwe Kiela. Kto: Model alignment as prospect theoretic optimization. arXiv preprint arXiv:2402.01306, 2024

  69. [109]

    Better synthetic data by retrieving and transforming existing datasets

    Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, and Graham Neubig. Better synthetic data by retrieving and transforming existing datasets. arXiv preprint arXiv:2404.14361, 2024

  70. [110]

    Are we done with mmlu? arXiv preprint arXiv:2406.04127, 2024

    Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, et al. Are we done with mmlu? arXiv preprint arXiv:2406.04127, 2024

  71. [111]

    Measuring Massive Multitask Language Understanding

    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020

  72. [112]

    ORPO: Monolithic Preference Optimization without Reference Model

    Jiwoo Hong, Noah Lee, and James Thorne. Reference-free monolithic preference optimization with odds ratio. arXiv preprint arXiv:2403.07691, 2024

  73. [113]

    Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Hanchi Sun, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric P. Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Mano...

  74. [114]

    Camels in a changing climate: Enhancing lm adaptation with tulu 2

    Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A Smith, Iz Beltagy, et al. Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv preprint arXiv:2311.10702, 2023

  75. [115]

    Alpaca against vicuna: Using llms to uncover memorization of llms

    Aly M Kassem, Omar Mahmoud, Niloofar Mireshghallah, Hyunwoo Kim, Yulia Tsvetkov, Yejin Choi, Sherif Saad, and Santu Rana. Alpaca against vicuna: Using llms to uncover memorization of llms. arXiv preprint arXiv:2403.04801, 2024

  76. [116]

    o pf, Yannic Kilcher, Dimitri von R\

    Andreas K\" o pf, Yannic Kilcher, Dimitri von R\" u tte, Sotiris Anagnostidis, Zhi Rui Tam, Keith Stevens, Abdullah Barhoum, Duc Nguyen, Oliver Stanley, Rich\' a rd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, and Alexander Mattick. Openassistant conversations - democratizing large langu...

  77. [117]

    Parikh, Nicolas Papernot, and Mohit Iyyer

    Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on sesame street! model extraction of bert-based apis. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Byl5NREFDr

  78. [118]

    Gonzalez, Hao Zhang, and Ion Stoica

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

  79. [119]

    Rewardbench: Evaluating reward models for language modeling

    Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, et al. Rewardbench: Evaluating reward models for language modeling. arXiv preprint arXiv:2403.13787, 2024

  80. [120]

    The winograd schema challenge

    Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning, 2012

Showing first 80 references.