pith. sign in

arxiv: 1812.10235 · v1 · pith:T4PUNPGDnew · submitted 2018-12-26 · 💻 cs.CL · cs.AI· cs.LG

A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling

classification 💻 cs.CL cs.AIcs.LG
keywords intentfillingmodelslottasksdetectionmodelssemantic
0
0 comments X
read the original abstract

Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks . The most effective algorithms are based on the structures of sequence to sequence models (or "encoder-decoder" models), and generate the intents and semantic tags either using separate models or a joint model. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. Most of these approaches use one (joint) NN based model (including encoder-decoder structure) to model two tasks, hence may not fully take advantage of the cross-impact between them. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-the-art result on the benchmark ATIS data, with about 0.5$\%$ intent accuracy improvement and 0.9 $\%$ slot filling improvement.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

    cs.CL 2019-06 unverdicted novelty 4.0

    Acoustic and emotion embeddings reduce EER for vocal expression detection by 60% and 30% relative to bag-of-words and acoustic baselines.