A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling

Hongxia Jin; Yilin Shen; Yu Wang

arxiv: 1812.10235 · v1 · pith:T4PUNPGDnew · submitted 2018-12-26 · 💻 cs.CL · cs.AI· cs.LG

A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling

Yu Wang , Yilin Shen , Hongxia Jin This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords intentfillingmodelslottasksdetectionmodelssemantic

0 comments

read the original abstract

Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or "encoder-decoder" models), and generate the intents and semantic tags either using separate models or a joint model. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. Most of these approaches use one (joint) NN based model (including encoder-decoder structure) to model two tasks, hence may not fully take advantage of the cross-impact between them. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data, with about 0.5$\%$ intent accuracy improvement and 0.9 $\%$ slot filling improvement.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice
cs.CL 2019-06 unverdicted novelty 4.0

Acoustic and emotion embeddings reduce EER for vocal expression detection by 60% and 30% relative to bag-of-words and acoustic baselines.