pith. sign in

Alpaca: A strong, replicable instruction-following model

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 1 cs.LG 1

years

2024 1 2023 1

representative citing papers

Zephyr: Direct Distillation of LM Alignment

cs.LG · 2023-10-25 · accept · novelty 6.0

Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.

citing papers explorer

Showing 2 of 2 citing papers.

  • Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive cs.CL · 2024-02-20 · conditional · none · ref 53

    DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.

  • Zephyr: Direct Distillation of LM Alignment cs.LG · 2023-10-25 · accept · none · ref 80

    Zephyr-7B achieves state-of-the-art chat benchmark results among 7B models by distilling alignment via dDPO on AI feedback preferences, surpassing the 70B Llama-2-Chat model on MT-Bench with no human data required.