pith. sign in

arxiv: 1710.11277 · v2 · pith:EDTUEVUHnew · submitted 2017-10-31 · 💻 cs.CL · cs.AI· cs.LG

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

classification 💻 cs.CL cs.AIcs.LG
keywords adversarialdialogueactionsactor-criticadvantagepolicyagentdiscriminator
0
0 comments X
read the original abstract

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.