Reinforcement learning on a reward model trained from human summary comparisons produces summaries humans prefer over supervised fine-tuning or human references on TL;DR and transfers to CNN/DM.
Controllable neural story generation via reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2representative citing papers
Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.
citing papers explorer
-
Learning to summarize from human feedback
Reinforcement learning on a reward model trained from human summary comparisons produces summaries humans prefer over supervised fine-tuning or human references on TL;DR and transfers to CNN/DM.
-
Fine-Tuning Language Models from Human Preferences
Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.