pith. machine review for the scientific record. sign in

arxiv: 1807.01554 · v1 · submitted 2018-07-04 · 💻 cs.CL · cs.AI

Recognition: unknown

Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords utterancesdataaugmentationdialoguelanguageunderstandingutterancedataset
0
0 comments X
read the original abstract

In this paper, we study the problem of data augmentation for language understanding in task-oriented dialogue system. In contrast to previous work which augments an utterance without considering its relation with other utterances, we propose a sequence-to-sequence generation based data augmentation framework that leverages one utterance's same semantic alternatives in the training data. A novel diversity rank is incorporated into the utterance representation to make the model produce diverse utterances and these diversely augmented utterances help to improve the language understanding module. Experimental results on the Airline Travel Information System dataset and a newly created semantic frame annotation on Stanford Multi-turn, Multidomain Dialogue Dataset show that our framework achieves significant improvements of 6.38 and 10.04 F-scores respectively when only a training set of hundreds utterances is represented. Case studies also confirm that our method generates diverse utterances.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Preserving Temporal Dynamics in Time Series Generation

    cs.LG 2026-04 unverdicted novelty 5.0

    An MCMC framework enforces empirical transition laws on GAN outputs to reduce temporal drift in synthetic multivariate time series.