pith. machine review for the scientific record. sign in

arxiv: 1901.05415 · v4 · submitted 2019-01-16 · 💻 cs.CL · cs.AI· cs.HC· cs.LG· stat.ML

Recognition: unknown

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

Authors on Pith no claims yet
classification 💻 cs.CL cs.AIcs.HCcs.LGstat.ML
keywords dialogueagentchatbottrainingexampleslearningconversationconversations
0
0 comments X
read the original abstract

The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user's responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot's dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Fine-Tuning Language Models from Human Preferences

    cs.CL 2019-09 unverdicted novelty 7.0

    Language models fine-tuned via RL on 5k-60k human preference comparisons produce stylistically better text continuations and human-preferred summaries that sometimes copy input sentences.