Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Baolin Peng; Jianfeng Gao; Jingjing Liu; Kam-Fai Wong; Xiujun Li; Yun-Nung Chen

arxiv: 1710.11277 · v2 · pith:EDTUEVUHnew · submitted 2017-10-31 · 💻 cs.CL · cs.AI· cs.LG

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Baolin Peng , Xiujun Li , Jianfeng Gao , Jingjing Liu , Yun-Nung Chen , Kam-Fai Wong This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords adversarialdialogueactionsactor-criticadvantagepolicyagentdiscriminator

0 comments

read the original abstract

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.

This paper has not been read by Pith yet.

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

discussion (0)