Dialogue learning with human-in-the-loop

Jiwei Li, Alexander H Miller, Sumit Chopra, Marc’Aurelio Ranzato, Jason Weston · arXiv 1611.09823

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Fine-Tuning Language Models from Human Preferences

cs.CL · 2019-09-18 · unverdicted · novelty 7.0

Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.

citing papers explorer

Showing 1 of 1 citing paper.

Fine-Tuning Language Models from Human Preferences cs.CL · 2019-09-18 · unverdicted · none · ref 17
Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.

Dialogue learning with human-in-the-loop

fields

years

verdicts

representative citing papers

citing papers explorer