AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Carlos Guestrin; Ishaan Gulrajani; Jimmy Ba; Percy Liang; Rohan Taori; Tatsunori B. Hashimoto; Tianyi Zhang; Xuechen Li; Yann Dubois

arxiv: 2305.14387 · v4 · pith:LOS5CL3Pnew · submitted 2023-05-22 · 💻 cs.LG · cs.AI· cs.CL

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Yann Dubois , Xuechen Li , Rohan Taori , Tianyi Zhang , Ishaan Gulrajani , Jimmy Ba , Carlos Guestrin , Percy Liang

show 1 more author

Tatsunori B. Hashimoto

This is my paper

classification 💻 cs.LG cs.AIcs.CL

keywords alpacafarmfeedbackhumanmodelsmethodsreferencechallengescost

0 comments

read the original abstract

Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their strong instruction-following abilities. Developing these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following requires tackling three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these challenges with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. First, we design LLM prompts to simulate human feedback that are 50x cheaper than crowdworkers and display high agreement with humans. Second, we propose an automatic evaluation and validate it against human instructions obtained on real-world interactions. Third, we contribute reference implementations for several methods (PPO, DPO, best-of-n, expert iteration, and more) that learn from pairwise feedback. Finally, as an end-to-end validation of AlpacaFarm, we train and evaluate eleven models on 10k pairs of real human feedback and show that rankings of models trained in AlpacaFarm match rankings of models trained on human data. As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003. We release all components of AlpacaFarm at https://github.com/tatsu-lab/alpaca_farm.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

C3-Bench: A Context-Aware Change Captioning Benchmark
cs.CV 2026-06 unverdicted novelty 7.0

C3-Bench supplies a multi-domain dataset and LLM-based evaluation protocol that exposes systematic failures in existing change captioning models outside their training regimes.
Self-Rewarding Language Models
cs.CL 2024-01 conditional novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks?
cs.AI 2026-06 unverdicted novelty 6.0

An empirical study evaluating tool-augmented LLM agents on 243 real-world energy analytics problems across data retrieval, knowledge interpretation, and quantitative modeling using domain-specific tools and multi-dime...
Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation
cs.AI 2026-06 unverdicted novelty 6.0

TBS is an interval-based multi-agent framework that separates private internal-state updates (dissonance appraisal, opinion climate, isolation risk, response strategy, willingness to speak) from public utterance selec...
Think-Before-Speak: From Internal Evaluation to Public Expression in Multi-Agent Social Simulation
cs.AI 2026-06 unverdicted novelty 6.0

TBS is an interval-based multi-agent LLM simulation framework that separates structured internal evaluative states from public utterance generation and shows these states vary systematically with turn-allocation, sile...
"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise
cs.CL 2026-06 unverdicted novelty 6.0

Decan (D_Ca_n = C × a_n) measures text diversity as progressive conditional surprise from base LM log-probabilities, scoring 0.846 OCA on McDiv benchmark and detecting monotonic diversity drop across base→SFT→DPO→RLVR stages.
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
cs.AI 2026-05 unverdicted novelty 6.0

OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
Diversity in Large Language Models under Supervised Fine-Tuning
cs.LG 2026-04 unverdicted novelty 6.0

TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.
On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization
cs.CL 2025-09 unverdicted novelty 6.0

Fine-tuned LLM judges struggle with future-proofing to newer generators but maintain backward-compatibility more easily; DPO training and continual learning improve adaptation while all models degrade on unseen questions.
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
cs.AI 2024-08 conditional novelty 6.0

Empirical analysis shows scaling inference compute via strategies like tree search can be more efficient than scaling model parameters, with 7B models plus novel search outperforming 34B models.
Corrective Retrieval Augmented Generation
cs.CL 2024-01 unverdicted novelty 6.0

CRAG improves RAG robustness via a retrieval quality evaluator that triggers web augmentation and a decompose-recompose filter to focus on relevant information, yielding better results on short- and long-form generati...
The Falcon Series of Open Language Models
cs.CL 2023-11 conditional novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
cs.CL 2023-10 unverdicted novelty 6.0

Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
Aligning Large Multimodal Models with Factually Augmented RLHF
cs.CV 2023-09 conditional novelty 6.0

Factually Augmented RLHF aligns large multimodal models to reduce hallucinations, reaching 94% of GPT-4 on LLaVA-Bench and 60% improvement on the new MMHAL-BENCH.
Chain-of-Verification Reduces Hallucination in Large Language Models
cs.CL 2023-09 unverdicted novelty 6.0

Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
cs.LG 2023-09 conditional novelty 6.0

Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Textbooks Are All You Need
cs.CL 2023-06 unverdicted novelty 6.0

A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
cs.CL 2023-06 accept novelty 6.0

GPT-4 as an LLM judge achieves over 80% agreement with human preferences on MT-Bench and Chatbot Arena, matching human agreement levels and providing a scalable evaluation method.
Large Language Models are not Fair Evaluators
cs.CL 2023-05 conditional novelty 6.0

LLMs show strong position bias when scoring model outputs, allowing easy manipulation of rankings, but calibration with multiple evidence, position balancing, and selective human input reduces this bias to better matc...
Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs
cs.LG 2026-06 unverdicted novelty 5.0

AIR augments activation-aware SVD compression of LLMs with an influence metric and a closed-form ALS update, claiming >18% perplexity improvement at 60% parameter retention and 90% less calibration data than SVD-LLM(W).
AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing
stat.ML 2026-06 unverdicted novelty 5.0

AURA is an adaptive uncertainty-aware refinement method for auditing LLM-as-a-judge pairwise decisions that learns human-consistency signals through selective human verification on uncertain cases.
Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning
cs.LG 2026-05 unverdicted novelty 5.0

SymNoise applies symmetric noise to embeddings during instruction fine-tuning and reports 6.7% higher AlpacaEval scores than NEFTune on LLaMA-2-7B.
Diversity in Large Language Models under Supervised Fine-Tuning
cs.LG 2026-04 unverdicted novelty 5.0

Supervised fine-tuning narrows LLM generative diversity through neglect of low-frequency patterns and knowledge forgetting, but the TOFU loss mitigates this effect across models and benchmarks.
LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems
cs.LG 2026-01 unverdicted novelty 3.0

A survey taxonomy of LLMs identifies three scaling crises and six efficiency paradigms while tracing the shift from generation to tool-using agents.
Benchmark Data Contamination of Large Language Models: A Survey
cs.CL 2024-06 unverdicted novelty 3.0

A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
A Comprehensive Overview of Large Language Models
cs.CL 2023-07 unverdicted novelty 2.0

A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.