HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking

Dilek Hakkani-T\"ur; Rahul Goel; Shachi Paul

arxiv: 1907.00883 · v1 · pith:HIE2TNSVnew · submitted 2019-07-01 · 💻 cs.CL · cs.AI

HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking

Rahul Goel , Shachi Paul , Dilek Hakkani-T\"ur This is my paper

Pith reviewed 2026-05-25 11:45 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords dialogue state trackinghybrid modelmulti-domain dialogueslot value predictioncandidate generationprobability distributionMultiWOZneural tracking

0 comments

The pith

A hybrid model learns per-slot whether to track values via full probability distributions or via candidate generation, yielding better accuracy on multi-domain dialogues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that dialogue state tracking benefits from letting a learned component decide, for each slot type, whether to use a full-distribution estimator or a candidate-generation approach. Full-distribution methods work well on small observed value sets but fail to scale or handle unseen values; candidate methods handle scale and novelty but underperform on accuracy. The hybrid therefore routes each slot to the method that fits it best. Experiments on the MultiWOZ-2.0 dataset show the selector produces measurable gains over both pure approaches and prior state-of-the-art systems.

Core claim

HyST trains a selector that, from conversation history alone, assigns each slot type to either a distribution-based tracker or a candidate-generation tracker; the resulting system scales to multi-domain settings, tracks unseen values, and improves joint goal accuracy by 24 percent relative to the previous state of the art and 10 percent relative to the strongest single-method baseline.

What carries the argument

A learned selector that, for every slot type, chooses between full-distribution estimation and candidate-set generation based on training data patterns.

If this is right

The approach works across a rich variety of slot types without requiring slot-specific hand engineering.
It simultaneously supports large value sets and values absent from training data.
Performance gains hold when the model is trained and evaluated on the full MultiWOZ-2.0 corpus.
The selector itself adds negligible extra computation once trained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Slot types appear to carry stable signals about which tracking regime suits them, suggesting the selector could generalize to new domains with minimal retraining.
The same selector idea could be tested on other sequence-labeling or value-prediction tasks where two complementary inference styles exist.
If the selector is made explicit rather than learned, the paper's results would indicate which observable features of a slot predict the better method.

Load-bearing premise

A selector trained only on observed slot-type patterns will consistently route each slot to the method that actually works better without itself adding errors.

What would settle it

On a held-out multi-domain dialogue corpus with the same slot vocabulary, replace the learned selector with random routing and measure whether joint goal accuracy falls below the best single-method baseline.

Figures

Figures reproduced from arXiv: 1907.00883 by Dilek Hakkani-T\"ur, Rahul Goel, Shachi Paul.

**Figure 1.** Figure 1: Independent model Sentence LSTM (Es) Dialogue LSTM (Ed) t t-2 t-1 + + + Ea t-2 FeedForward S(A) FeedForward S(B) FeedForward S(N) Values for Slot A ≈India n Greek Turk Values for Slot B ≈ Values for Slot N ≈ Ea t-1 Ea t [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

read the original abstract

Recent works on end-to-end trainable neural network based approaches have demonstrated state-of-the-art results on dialogue state tracking. The best performing approaches estimate a probability distribution over all possible slot values. However, these approaches do not scale for large value sets commonly present in real-life applications and are not ideal for tracking slot values that were not observed in the training set. To tackle these issues, candidate-generation-based approaches have been proposed. These approaches estimate a set of values that are possible at each turn based on the conversation history and/or language understanding outputs, and hence enable state tracking over unseen values and large value sets however, they fall short in terms of performance in comparison to the first group. In this work, we analyze the performance of these two alternative dialogue state tracking methods, and present a hybrid approach (HyST) which learns the appropriate method for each slot type. To demonstrate the effectiveness of HyST on a rich-set of slot types, we experiment with the recently released MultiWOZ-2.0 multi-domain, task-oriented dialogue-dataset. Our experiments show that HyST scales to multi-domain applications. Our best performing model results in a relative improvement of 24% and 10% over the previous SOTA and our best baseline respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HyST's per-slot learned choice between full-distribution and candidate-generation DST is a practical idea worth testing, but the 24% gain claim rests on thin evidence without selector analysis or strong ablations.

read the letter

The paper introduces a hybrid DST model that trains a selector to pick, per slot type, between estimating a full distribution over values and generating candidates from history. This targets the scalability problem of the first approach and the weaker accuracy of the second, and they test it on MultiWOZ-2.0 across multiple domains. The core move—letting the model decide the method slot-by-slot rather than applying one uniformly—is the actual novelty here, and it makes sense for real applications with mixed slot types. They report a 24% relative lift over prior SOTA and 10% over their own best baseline, which would be useful if it holds. The main soft spot is that the abstract gives no numbers on how accurate the selector itself is, no ablation that forces one method across all slots, and no breakdown of errors on unseen values or new domains. Without those, the gains could come from lucky post-hoc selection rather than reliable adaptation. The stress-test worry about the selector adding its own error or overfitting to training value distributions looks like a real gap. This is aimed at dialogue-systems researchers who already work with MultiWOZ-style data and need something that scales beyond small value sets. A reader already familiar with the two base paradigms would get the most out of it. The work deserves peer review because the problem is concrete and the hybrid framing is worth checking with proper controls, even if the current experiments need more detail to be convincing.

Referee Report

2 major / 1 minor

Summary. The paper proposes HyST, a hybrid dialogue state tracking approach that learns a selector to choose, per slot type, between full probability distribution estimation over all values and candidate-generation methods. Experiments on the MultiWOZ-2.0 dataset are reported to yield a 24% relative improvement over prior state-of-the-art and 10% over the authors' best baseline, with the hybrid claimed to scale to multi-domain settings while handling large and unseen value sets.

Significance. If the selector reliably adapts without adding substantial error, the hybrid could usefully combine the accuracy of distribution-based trackers with the scalability of candidate-based ones on realistic multi-domain data; the empirical gains on a public benchmark would then constitute a practical contribution to task-oriented dialogue systems.

major comments (2)

[Experiments / Method (selector description)] The central empirical claim (24%/10% relative gains) rests on the learned selector correctly choosing the tracking method per slot type from MultiWOZ-2.0 training data alone. No selector accuracy metrics, per-slot confusion analysis, or ablation that forces a single method across all slots is described, leaving open the possibility that reported gains are artifacts of post-hoc selection rather than robust per-slot adaptation.
[Experiments] The abstract and results assert relative improvements without reporting exact baseline implementations, data splits, statistical significance tests, or error analysis that would allow assessment of whether the hybrid outperforms both pure methods on the same splits.

minor comments (1)

[Abstract] The phrase 'rich-set of slot types' in the abstract should be 'rich set'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below.

read point-by-point responses

Referee: [Experiments / Method (selector description)] The central empirical claim (24%/10% relative gains) rests on the learned selector correctly choosing the tracking method per slot type from MultiWOZ-2.0 training data alone. No selector accuracy metrics, per-slot confusion analysis, or ablation that forces a single method across all slots is described, leaving open the possibility that reported gains are artifacts of post-hoc selection rather than robust per-slot adaptation.

Authors: The selector is trained jointly in an end-to-end manner with the two tracking methods on the MultiWOZ-2.0 training data; selection decisions are therefore not post-hoc but emerge from optimization. The fact that the hybrid outperforms both pure distribution-based and pure candidate-based baselines on the same data provides evidence that per-slot adaptation is beneficial. We nevertheless agree that selector accuracy metrics, per-slot confusion matrices, and a forced-single-method ablation would make the adaptation claim more transparent, and we will add these analyses in the revision. revision: yes
Referee: [Experiments] The abstract and results assert relative improvements without reporting exact baseline implementations, data splits, statistical significance tests, or error analysis that would allow assessment of whether the hybrid outperforms both pure methods on the same splits.

Authors: We will expand the experimental section to include (i) precise descriptions and hyper-parameter settings of all baselines, (ii) confirmation that the standard MultiWOZ-2.0 train/dev/test splits were used, (iii) statistical significance tests (e.g., bootstrap or paired t-tests) on the reported metrics, and (iv) a concise error analysis comparing the hybrid against the two pure methods on the same splits. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical hybrid model with no derivations or self-referential reductions

full rationale

The paper introduces HyST as a learned selector between two existing DST paradigms (full-distribution vs. candidate generation) and reports empirical gains on MultiWOZ-2.0. No equations, parameter-fitting steps, or derivations are described that could reduce a claimed prediction to its own inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked. The central result is a standard supervised model evaluated on held-out data; the reported 24%/10% relative improvements are therefore external measurements rather than tautological restatements of fitted quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no concrete free parameters, axioms, or invented entities; all model internals remain unspecified.

pith-pipeline@v0.9.0 · 5755 in / 1083 out tokens · 35882 ms · 2026-05-25T11:45:42.475340+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HyST which learns the appropriate method for each slot type... Our best performing model results in a relative improvement of 24% and 10% over the previous SOTA
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hybrid approach (HyST) which learns the appropriate method for each slot type

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 5 internal anchors

[1]

fancy restaurant

Introduction Task-oriented dialogue systems aim to enable users to accom- plish tasks through spoken interactions. Dialogue state tracking in task-oriented dialogue systems has been proposed as a part of dialogue management and aims to estimate the belief of the dialogue system on the state of a conversation given the entire previous conversation context ...

work page
[2]

HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking

Related Work Dialogue state tracking (or belief tracking) aims to maintain a distribution over possible dialogue states [9, 10], which are often represented as a set of key-value pairs. The dialogue states are then used when interacting with the external back- end knowledge base or action sources in determining what the next system action should be. Previ...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[3]

a1, u1, a2, u2, ..., aN , uN

Methodology A dialogue D with N turns is denoted as a series of agent ( ai) and user (ui) turns i.e. a1, u1, a2, u2, ..., aN , uN . The task of state tracking is to predict the state (Si) after each user turn, ui, of the conversation. The conversation state ( Si) is commonly deﬁned as a set of slot values, sk i , for slot types sk, where k ∈ {1, ..., T} w...

work page
[4]

Ei = ← − − − − − − − LST M sent(ui) ⊕ − − − − − − − → LST M sent(ui)

User utterance encoder ( Ei): We use a biLSTM to en- code each utterance, ui = wi 1, ..., wi ni, where ni denotes the number of tokens in ui and the ﬁnal utterance repre- sentation for utterance ei is obtained by concatenating the last hidden state of the forward lstm,− − − − →LST M and the ﬁrst hidden state of the backward lstm,← − − − −LST M. Ei = ← − −...

work page
[5]

Zi = LST M dialogue(E1, ...Ei) (2)

Hierarchical LSTM ( Zi): We use a unidirectional LSTM over past user utterances to encode the dialogue context. Zi = LST M dialogue(E1, ...Ei) (2)

work page
[6]

LST M dialogueAct(s1, ...sk) We concatenate all of these features into a context feature vector Fcontext

Dialogue Act LSTM ( Ai): We use a unidirectional LSTM over agent dialogue acts to encode agent dialogue acts. LST M dialogueAct(s1, ...sk) We concatenate all of these features into a context feature vector Fcontext. The context encoders are shared for all slots. For every slot type, we have: Fcontext = [Ei; Zi; Ai] (3) ˆyj = sigmoid(F Fk(cj i , Fcontext))...

work page
[7]

Some of the slots, for example, day and people, occur in multiple domains

Data For our state tracking experiments we use the MultiWOZ-2.0 dataset [8].The MultiWOZ-2.0 dataset consists of multi-domain conversations from 7 domains with a total of 37 slots across domains. Some of the slots, for example, day and people, occur in multiple domains. An example conversation is shown in Table 1. For our experiments, we treat each slot i...

work page
[8]

We use ADAM [21] for optimization with a learning rate of 0.001 and default parameters

Experimental Setup In all experiments, we clip each turn to 30 tokens and each di- alogue to past 30 turns. We use ADAM [21] for optimization with a learning rate of 0.001 and default parameters. We use a batch size of 128 while training. We initialize our embed- ding matrices randomly and learn them during training. We use manual search to tune all our p...

work page
[9]

As in previous work, we report joint goal accuracy as our metric

Results We present per domain results in Table 4. As in previous work, we report joint goal accuracy as our metric. For each user turn, we get the joint goal correct if our predicted state exactly matches the ground truth state for all the slots in that domain. As our candidate set generation is based on n-grams OOV rate for the OV oracle (Table 3) is hig...

work page
[10]

On the other hand the open-vocabulary approach is very ﬂex- ible and shows better performance on large-vocabulary slots

Conclusions The joint tracking approach couples spoken language under- standing and dialogue state tracking to achieve high accuracy on state tracking benchmarks, but this limits its performance on slots with large vocabulary as shown in our experiments. On the other hand the open-vocabulary approach is very ﬂex- ible and shows better performance on large...

work page
[11]

Talking to machines (statistically speaking),

S. Young, “Talking to machines (statistically speaking),” in Pro- ceedings of Interspeech, 2002

work page 2002
[12]

The dia- log state tracking challenge,

J. Williams, A. Raux, D. Ramachandran, and A. Black, “The dia- log state tracking challenge,” inProceedings of the SIGDIAL 2013 Conference, 2013, pp. 404–413

work page 2013
[13]

The second di- alog state tracking challenge

M. Henderson, B. Thomson, and J. D. Williams, “The second di- alog state tracking challenge.” in SIGDIAL Conference, 2014, pp. 263–272

work page 2014
[14]

An end-to-end trainable neural network model with belief tracking for task-oriented dialog,

B. Liu and I. Lane, “An end-to-end trainable neural network model with belief tracking for task-oriented dialog,” in Proceed- ings of Interspeech, 2017

work page 2017
[15]

Neural belief tracker: Data-driven dialogue state track- ing,

N. Mrk ˇsi´c, D. O. S ´eaghdha, T.-H. Wen, B. Thomson, and S. Young, “Neural belief tracker: Data-driven dialogue state track- ing,” in 55th Annual Meeting of the Association for Computa- tional Linguistics (ACL), 2017

work page 2017
[16]

Scalable multi-domain dialogue state tracking,

A. Rastogi, D. Hakkani-T ¨ur, and L. Heck, “Scalable multi-domain dialogue state tracking,” in Automatic Speech Recognition and Understanding Workshop (ASRU), 2017 IEEE. IEEE, 2017, pp. 561–568

work page 2017
[17]

Flexible and Scalable State Tracking Framework for Goal-Oriented Dialogue Systems

R. Goel, S. Paul, T. Chung, J. Lecomte, A. Mandal, and D. Hakkani-Tur, “Flexible and scalable state tracking frame- work for goal-oriented dialogue systems,” arXiv preprint arXiv:1811.12891, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Ga ˇsi´c, “Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018

work page 2018
[19]

A k hypotheses+ other belief up- dating model,

D. Bohus and A. Rudnicky, “A k hypotheses+ other belief up- dating model,” in Proc. of the AAAI Workshop on Statistical and Empirical Methods in Spoken Dialogue Systems , vol. 62, 2006

work page 2006
[20]

Partially observable markov deci- sion processes for spoken dialog systems,

J. D. Williams and S. Young, “Partially observable markov deci- sion processes for spoken dialog systems,” Computer Speech & Language, vol. 21, no. 2, pp. 393–422, 2007

work page 2007
[21]

A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believ- ability of observed information,

Z. Wang and O. Lemon, “A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believ- ability of observed information,” in Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 423–432

work page 2013
[22]

Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems,

B. Thomson and S. Young, “Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems,” Computer Speech & Language, vol. 24, no. 4, pp. 562–588, 2010

work page 2010
[23]

Recipe for building robust spoken dia- log state trackers: Dialog state tracking challenge system descrip- tion,

S. Lee and M. Eskenazi, “Recipe for building robust spoken dia- log state trackers: Dialog state tracking challenge system descrip- tion,” in Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 414–422

work page 2013
[24]

Word-based dialog state tracking with recurrent neural networks,

M. Henderson, B. Thomson, and S. Young, “Word-based dialog state tracking with recurrent neural networks,” in Proceedings of the 15th Annual Meeting of the Special Interest Group on Dis- course and Dialogue (SIGDIAL) , 2014, pp. 292–299

work page 2014
[25]

Dialog state tracking, a machine reading approach using Memory Network

J. Perez and F. Liu, “Dialog state tracking, a machine reading approach using memory network,” arXiv preprint arXiv:1606.04052, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[26]

An end-to-end approach for handling unknown slot values in dialogue state tracking,

P. Xu and Q. Hu, “An end-to-end approach for handling unknown slot values in dialogue state tracking,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

work page 2018
[27]

Towards universal dialogue state tracking,

L. Ren, K. Xia, L. Chen, and K. Yu, “Towards universal dialogue state tracking,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

work page
[28]

Multi-task learning for joint language understanding and dialogue state tracking,

A. Rastogi, R. Gupta, and D. Hakkani-Tur, “Multi-task learning for joint language understanding and dialogue state tracking,” in Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018, pp. 376–384

work page 2018
[29]

Toward scalable neural dialogue state tracking model,

E. Nouri and E. Hosseini-Asl, “Toward scalable neural dialogue state tracking model,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2nd Conversational AI work- shop, 2018

work page 2018
[30]

Global-Locally Self-Attentive Dialogue State Tracker

V . Zhong, C. Xiong, and R. Socher, “Global-locally self-attentive dialogue state tracker,” arXiv preprint arXiv:1805.09655, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- mization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[1] [1]

fancy restaurant

Introduction Task-oriented dialogue systems aim to enable users to accom- plish tasks through spoken interactions. Dialogue state tracking in task-oriented dialogue systems has been proposed as a part of dialogue management and aims to estimate the belief of the dialogue system on the state of a conversation given the entire previous conversation context ...

work page

[2] [2]

HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking

Related Work Dialogue state tracking (or belief tracking) aims to maintain a distribution over possible dialogue states [9, 10], which are often represented as a set of key-value pairs. The dialogue states are then used when interacting with the external back- end knowledge base or action sources in determining what the next system action should be. Previ...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[3] [3]

a1, u1, a2, u2, ..., aN , uN

Methodology A dialogue D with N turns is denoted as a series of agent ( ai) and user (ui) turns i.e. a1, u1, a2, u2, ..., aN , uN . The task of state tracking is to predict the state (Si) after each user turn, ui, of the conversation. The conversation state ( Si) is commonly deﬁned as a set of slot values, sk i , for slot types sk, where k ∈ {1, ..., T} w...

work page

[4] [4]

Ei = ← − − − − − − − LST M sent(ui) ⊕ − − − − − − − → LST M sent(ui)

User utterance encoder ( Ei): We use a biLSTM to en- code each utterance, ui = wi 1, ..., wi ni, where ni denotes the number of tokens in ui and the ﬁnal utterance repre- sentation for utterance ei is obtained by concatenating the last hidden state of the forward lstm,− − − − →LST M and the ﬁrst hidden state of the backward lstm,← − − − −LST M. Ei = ← − −...

work page

[5] [5]

Zi = LST M dialogue(E1, ...Ei) (2)

Hierarchical LSTM ( Zi): We use a unidirectional LSTM over past user utterances to encode the dialogue context. Zi = LST M dialogue(E1, ...Ei) (2)

work page

[6] [6]

LST M dialogueAct(s1, ...sk) We concatenate all of these features into a context feature vector Fcontext

Dialogue Act LSTM ( Ai): We use a unidirectional LSTM over agent dialogue acts to encode agent dialogue acts. LST M dialogueAct(s1, ...sk) We concatenate all of these features into a context feature vector Fcontext. The context encoders are shared for all slots. For every slot type, we have: Fcontext = [Ei; Zi; Ai] (3) ˆyj = sigmoid(F Fk(cj i , Fcontext))...

work page

[7] [7]

Some of the slots, for example, day and people, occur in multiple domains

Data For our state tracking experiments we use the MultiWOZ-2.0 dataset [8].The MultiWOZ-2.0 dataset consists of multi-domain conversations from 7 domains with a total of 37 slots across domains. Some of the slots, for example, day and people, occur in multiple domains. An example conversation is shown in Table 1. For our experiments, we treat each slot i...

work page

[8] [8]

We use ADAM [21] for optimization with a learning rate of 0.001 and default parameters

Experimental Setup In all experiments, we clip each turn to 30 tokens and each di- alogue to past 30 turns. We use ADAM [21] for optimization with a learning rate of 0.001 and default parameters. We use a batch size of 128 while training. We initialize our embed- ding matrices randomly and learn them during training. We use manual search to tune all our p...

work page

[9] [9]

As in previous work, we report joint goal accuracy as our metric

Results We present per domain results in Table 4. As in previous work, we report joint goal accuracy as our metric. For each user turn, we get the joint goal correct if our predicted state exactly matches the ground truth state for all the slots in that domain. As our candidate set generation is based on n-grams OOV rate for the OV oracle (Table 3) is hig...

work page

[10] [10]

On the other hand the open-vocabulary approach is very ﬂex- ible and shows better performance on large-vocabulary slots

Conclusions The joint tracking approach couples spoken language under- standing and dialogue state tracking to achieve high accuracy on state tracking benchmarks, but this limits its performance on slots with large vocabulary as shown in our experiments. On the other hand the open-vocabulary approach is very ﬂex- ible and shows better performance on large...

work page

[11] [11]

Talking to machines (statistically speaking),

S. Young, “Talking to machines (statistically speaking),” in Pro- ceedings of Interspeech, 2002

work page 2002

[12] [12]

The dia- log state tracking challenge,

J. Williams, A. Raux, D. Ramachandran, and A. Black, “The dia- log state tracking challenge,” inProceedings of the SIGDIAL 2013 Conference, 2013, pp. 404–413

work page 2013

[13] [13]

The second di- alog state tracking challenge

M. Henderson, B. Thomson, and J. D. Williams, “The second di- alog state tracking challenge.” in SIGDIAL Conference, 2014, pp. 263–272

work page 2014

[14] [14]

An end-to-end trainable neural network model with belief tracking for task-oriented dialog,

B. Liu and I. Lane, “An end-to-end trainable neural network model with belief tracking for task-oriented dialog,” in Proceed- ings of Interspeech, 2017

work page 2017

[15] [15]

Neural belief tracker: Data-driven dialogue state track- ing,

N. Mrk ˇsi´c, D. O. S ´eaghdha, T.-H. Wen, B. Thomson, and S. Young, “Neural belief tracker: Data-driven dialogue state track- ing,” in 55th Annual Meeting of the Association for Computa- tional Linguistics (ACL), 2017

work page 2017

[16] [16]

Scalable multi-domain dialogue state tracking,

A. Rastogi, D. Hakkani-T ¨ur, and L. Heck, “Scalable multi-domain dialogue state tracking,” in Automatic Speech Recognition and Understanding Workshop (ASRU), 2017 IEEE. IEEE, 2017, pp. 561–568

work page 2017

[17] [17]

Flexible and Scalable State Tracking Framework for Goal-Oriented Dialogue Systems

R. Goel, S. Paul, T. Chung, J. Lecomte, A. Mandal, and D. Hakkani-Tur, “Flexible and scalable state tracking frame- work for goal-oriented dialogue systems,” arXiv preprint arXiv:1811.12891, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Ga ˇsi´c, “Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018

work page 2018

[19] [19]

A k hypotheses+ other belief up- dating model,

D. Bohus and A. Rudnicky, “A k hypotheses+ other belief up- dating model,” in Proc. of the AAAI Workshop on Statistical and Empirical Methods in Spoken Dialogue Systems , vol. 62, 2006

work page 2006

[20] [20]

Partially observable markov deci- sion processes for spoken dialog systems,

J. D. Williams and S. Young, “Partially observable markov deci- sion processes for spoken dialog systems,” Computer Speech & Language, vol. 21, no. 2, pp. 393–422, 2007

work page 2007

[21] [21]

A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believ- ability of observed information,

Z. Wang and O. Lemon, “A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believ- ability of observed information,” in Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 423–432

work page 2013

[22] [22]

Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems,

B. Thomson and S. Young, “Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems,” Computer Speech & Language, vol. 24, no. 4, pp. 562–588, 2010

work page 2010

[23] [23]

Recipe for building robust spoken dia- log state trackers: Dialog state tracking challenge system descrip- tion,

S. Lee and M. Eskenazi, “Recipe for building robust spoken dia- log state trackers: Dialog state tracking challenge system descrip- tion,” in Proceedings of the SIGDIAL 2013 Conference, 2013, pp. 414–422

work page 2013

[24] [24]

Word-based dialog state tracking with recurrent neural networks,

M. Henderson, B. Thomson, and S. Young, “Word-based dialog state tracking with recurrent neural networks,” in Proceedings of the 15th Annual Meeting of the Special Interest Group on Dis- course and Dialogue (SIGDIAL) , 2014, pp. 292–299

work page 2014

[25] [25]

Dialog state tracking, a machine reading approach using Memory Network

J. Perez and F. Liu, “Dialog state tracking, a machine reading approach using memory network,” arXiv preprint arXiv:1606.04052, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[26] [26]

An end-to-end approach for handling unknown slot values in dialogue state tracking,

P. Xu and Q. Hu, “An end-to-end approach for handling unknown slot values in dialogue state tracking,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

work page 2018

[27] [27]

Towards universal dialogue state tracking,

L. Ren, K. Xia, L. Chen, and K. Yu, “Towards universal dialogue state tracking,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

work page

[28] [28]

Multi-task learning for joint language understanding and dialogue state tracking,

A. Rastogi, R. Gupta, and D. Hakkani-Tur, “Multi-task learning for joint language understanding and dialogue state tracking,” in Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 2018, pp. 376–384

work page 2018

[29] [29]

Toward scalable neural dialogue state tracking model,

E. Nouri and E. Hosseini-Asl, “Toward scalable neural dialogue state tracking model,” in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2nd Conversational AI work- shop, 2018

work page 2018

[30] [30]

Global-Locally Self-Attentive Dialogue State Tracker

V . Zhong, C. Xiong, and R. Socher, “Global-locally self-attentive dialogue state tracker,” arXiv preprint arXiv:1805.09655, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- mization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014