Fine-tuning GPT-3 on human demonstrations followed by reinforcement learning from human preference rankings yields smaller models that humans judge superior to the much larger base model on instruction following, truthfulness, and reduced toxicity.
We take prompts submitted to our API, and several model completions, and have labelers rank the completions by overall quality
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2022 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Training language models to follow instructions with human feedback
Fine-tuning GPT-3 on human demonstrations followed by reinforcement learning from human preference rankings yields smaller models that humans judge superior to the much larger base model on instruction following, truthfulness, and reduced toxicity.