Alpaca: A strong, replicable instruction-following model

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, Tatsunori B Hashimoto · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

cs.CL · 2024-02-20 · conditional · novelty 6.0

DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.

Zephyr: Direct Distillation of LM Alignment

cs.LG · 2023-10-25 · accept · novelty 6.0

Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.

citing papers explorer

Showing 2 of 2 citing papers.

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive cs.CL · 2024-02-20 · conditional · none · ref 53
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
Zephyr: Direct Distillation of LM Alignment cs.LG · 2023-10-25 · accept · none · ref 80
Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.

Alpaca: A strong, replicable instruction-following model

fields

years

verdicts

representative citing papers

citing papers explorer