Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues

Dilek Hakkani-T\"ur; Rahul Goel; Shachi Paul

arxiv: 1907.03020 · v1 · pith:YM76K64Ynew · submitted 2019-07-05 · 💻 cs.CL · cs.AI

Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues

Shachi Paul , Rahul Goel , Dilek Hakkani-T\"ur This is my paper

Pith reviewed 2026-05-25 02:00 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords dialogue actstask-oriented dialoguesuniversal schemaannotation alignmentsemi-supervised tagginghuman-human conversationsdialogue act tagging

0 comments

The pith

A universal dialogue act schema aligns existing datasets so a single tagger can label human-human task-oriented conversations at 54% F1 without new annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to cut the expense of labeling large volumes of human-human dialogue data by defining one shared schema for dialogue acts and mapping several already-annotated task-oriented datasets onto it. Manual mappings plus automated alignment let the authors pool the data and train a Universal DA tagger. When this tagger is applied to unlabeled human-human conversations it reaches 54.1% F1 on system turns; adding a modest amount of target-domain data raises the score to 57.7%, a level that would otherwise demand at least 1.7K fresh manual labels. The same approach yields further gains when unlabeled or labeled data from a new domain becomes available.

Core claim

The authors define a Universal DA schema for task-oriented dialogues, align multiple existing annotated datasets to it through a combination of manual and automated methods, and train a Universal DA tagger (U-DAT) on the resulting pooled data. Applied to human-human dialogues, the tagger obtains 54.1% F1 on system turns in a fully unsupervised setting and 57.7% F1 in a semi-supervised setting that would otherwise require at least 1.7K manually annotated turns; performance improves further when unlabeled or labeled target-domain data is supplied.

What carries the argument

The Universal DA schema together with the manual-plus-automated alignment procedure that converts labels from prior datasets into the common schema.

If this is right

Labeled task-oriented datasets can be reused across projects instead of being discarded when schemas differ.
New domains or customer-care logs become usable for training with far less than 1.7K new annotations.
Performance on target human-human data rises when even modest amounts of unlabeled or labeled target data are added.
The same aligned resource supports both human-machine and human-human tagging tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The alignment technique could be applied to other dialogue annotations such as slot values or user intents.
If the universal schema proves stable, the tagger might serve as a starting point for open-domain dialogue labeling.
Real-world customer logs could be run through the tagger to measure how often its output matches downstream system actions.

Load-bearing premise

Existing dialogue-act schemas can be mapped onto the proposed universal schema without introducing enough label noise or systematic bias to undermine training and evaluation on human-human data.

What would settle it

Collect a fresh sample of human-human turns, have experts label them directly with the universal schema, and check whether the tagger's F1 on those turns falls substantially below the reported 54.1% unsupervised figure.

read the original abstract

Machine learning approaches for building task-oriented dialogue systems require large conversational datasets with labels to train on. We are interested in building task-oriented dialogue systems from human-human conversations, which may be available in ample amounts in existing customer care center logs or can be collected from crowd workers. Annotating these datasets can be prohibitively expensive. Recently multiple annotated task-oriented human-machine dialogue datasets have been released, however their annotation schema varies across different collections, even for well-defined categories such as dialogue acts (DAs). We propose a Universal DA schema for task-oriented dialogues and align existing annotated datasets with our schema. Our aim is to train a Universal DA tagger (U-DAT) for task-oriented dialogues and use it for tagging human-human conversations. We investigate multiple datasets, propose manual and automated approaches for aligning the different schema, and present results on a target corpus of human-human dialogues. In unsupervised learning experiments we achieve an F1 score of 54.1% on system turns in human-human dialogues. In a semi-supervised setup, the F1 score increases to 57.7% which would otherwise require at least 1.7K manually annotated turns. For new domains, we show further improvements when unlabeled or labeled target domain data is available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a universal DA schema, aligns several existing task-oriented datasets to it, and reports F1 scores of 54.1% unsupervised and 57.7% semi-supervised when tagging human-human dialogues.

read the letter

The core advance is the universal schema plus the manual and automated alignment steps that let them pool labeled data from prior corpora and apply the resulting tagger to human-human turns. They show a concrete gain from adding a small amount of target-domain data and quantify the annotation savings. That is useful inside the subfield because schema differences have been a recurring headache when people try to combine DSTC-style and MultiWOZ-style annotations. The experiments are straightforward and the numbers are presented without obvious circularity. The semi-supervised result is framed against the cost of manual labels, which is a practical way to think about it. The main weakness is that the abstract gives no evidence on alignment quality—no agreement figures, no held-out accuracy on the mapping step, no error analysis of which acts get collapsed or mis-mapped. If the alignments introduce systematic noise, especially on system versus user turns or domain-specific intents, the downstream F1 numbers become hard to interpret. The paper would need to show that the universal labels are stable before the headline claims can be taken at face value. This is the kind of work that belongs in a dialogue or conversational AI venue. Readers who actually build taggers or need to label new human-human logs will get the most out of it; the rest of us can skim the alignment details. It is worth sending to referees because the problem is real, the setup is reproducible in principle, and the gap they identify is worth closing even if the current validation is thin.

Referee Report

1 major / 1 minor

Summary. The paper proposes a Universal Dialogue Act (DA) schema for task-oriented dialogues, aligns multiple existing annotated human-machine dialogue datasets to this schema via manual and automated methods, trains a Universal DA Tagger (U-DAT), and evaluates it on a target corpus of human-human dialogues. It reports an unsupervised F1 of 54.1% on system turns that rises to 57.7% in a semi-supervised setting, claiming this performance would otherwise require at least 1.7K manual annotations.

Significance. If the alignment step produces labels free of substantial noise, the work would demonstrate a practical route to bootstrap DA taggers for human-human data from cheaper existing human-machine corpora, directly addressing the annotation bottleneck for task-oriented dialogue systems.

major comments (1)

[Alignment methods section] The section describing the manual and automated schema alignment provides no validation metrics (inter-annotator agreement, held-out accuracy against gold universal labels, or error analysis of collapsed acts). Because the headline unsupervised (54.1%) and semi-supervised (57.7%) F1 scores on the human-human target corpus are obtained by training on the aligned labels, the absence of any quality check on the alignment is load-bearing for the central claim.

minor comments (1)

[Abstract and Experiments] The abstract states headline F1 numbers without reference to model architecture, validation splits, or confidence intervals; these details should appear in the main experimental section for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: [Alignment methods section] The section describing the manual and automated schema alignment provides no validation metrics (inter-annotator agreement, held-out accuracy against gold universal labels, or error analysis of collapsed acts). Because the headline unsupervised (54.1%) and semi-supervised (57.7%) F1 scores on the human-human target corpus are obtained by training on the aligned labels, the absence of any quality check on the alignment is load-bearing for the central claim.

Authors: We agree that explicit validation metrics for the alignment are necessary to support the central claim. In the revised manuscript we will report inter-annotator agreement on a sampled subset of the manual alignments, held-out accuracy of the automated alignment against a small set of gold universal labels, and a qualitative error analysis of the collapsed acts. These additions will be placed in the alignment section and will be independent of the downstream tagging results. revision: yes

Circularity Check

0 steps flagged

No circularity: performance metrics are independent empirical outcomes

full rationale

The paper defines a universal DA schema, describes manual/automated alignment of prior datasets to it, trains U-DAT on the aligned data, and reports F1 scores (54.1% unsupervised, 57.7% semi-supervised) on a separate target human-human corpus. None of these steps reduce by construction to the reported numbers; the F1 values are downstream results of training and evaluation rather than re-statements of fitted inputs, self-citations, or renamed patterns. The alignment step is a preprocessing choice whose correctness is an external assumption, not a definitional loop inside the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the premise that dialogue act labels from heterogeneous datasets can be mapped to a single schema without critical loss of meaning; this premise is introduced in the abstract but not evidenced within the provided text.

axioms (1)

domain assumption Existing dialogue act annotation schemas from multiple datasets can be aligned to a single universal schema without substantial information loss or bias
The paper states it aligns datasets with the proposed schema; this alignment step is required for the combined training data to be usable.

invented entities (1)

Universal DA schema no independent evidence
purpose: To provide a common label set that multiple prior datasets can be mapped onto
The schema is introduced by the authors as the foundation for alignment and tagging.

pith-pipeline@v0.9.0 · 5753 in / 1442 out tokens · 33451 ms · 2026-05-25T02:00:43.831165+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 3 internal anchors

[1]

DAs have been investi- gated by dialogue researchers for many years [2] and multi- ple taxonomies have been proposed [3, 4, 5] (see [6] for a review)

Introduction Dialogue acts (DAs) aim to portray the meaning of utterances at the level of illocutionary force, capturing a speaker’s inten- tion in producing that utterance [1]. DAs have been investi- gated by dialogue researchers for many years [2] and multi- ple taxonomies have been proposed [3, 4, 5] (see [6] for a review). Recent work in task-oriented...

work page
[2]

Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues

Related Work The mismatch between multiple DA taxonomies has been iden- tiﬁed by [6] previously, where a subset of ISO 24617-2 (the in- ternational ISO standard for DA annotation) tags [17] have been identiﬁed and annotations of multiple corpora were mapped to this set, focusing on social conversations. Our work has a simi- lar goal, but focuses on DAs th...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[3]

D = u1, u2, ..., uN and A be the predeﬁned set of M DAs i.e

DA Tagging for Dialogue Systems Let a dialogue D with N turns be denoted as a series of user and system utterances, ui, i.e. D = u1, u2, ..., uN and A be the predeﬁned set of M DAs i.e. A = a1, a2..., aM . Given an utterance ui and its conversation history, DA tagging aims to predict the set of DAs Ai ⊂ A of ui. We use a deep neural network based model fo...

work page
[4]

A bi-directional LSTM to encode each utterance, ui = wi 1, ..., wi ni, where ni denotes the number of tokens in ui and the ﬁnal utterance representation for utterance zi is obtained by concatenating the last hidden layer of the forward LSTM, − − − − →LST M, and the ﬁrst hidden layer of the backward LSTM, ← − − − −LST M: zi = ← − − − −LST M (ui) ⊕ − − − − ...

work page
[5]

A hierarchical, uni-directional LSTM to encode the dia- logue level information, ei: ei = LST M (z1, ...zi−1)

work page
[6]

An indicator number, gi, representing whether the agent is user or the system, i.e., gi = 0 , if uagent i = user, gi = 1 otherwise

work page
[7]

A DA vector is represented as a many-hot vector di of dimension M, where we mark the true DAs as 1

Encoding over past DA(s) pi, where the ﬁnal represen- tation is obtained by concatenating the many-hot repre- sentations of past-DAs. A DA vector is represented as a many-hot vector di of dimension M, where we mark the true DAs as 1. pi = d1 ⊕ d2 ⊕ ... ⊕ di−1 The ﬁnal encoded contextCi is given by: Ci = ei ⊕ gi ⊕ pi (2) Ci is then fed into a feed forward ...

work page
[8]

There- fore, we need a uniﬁed representation of all the acts present across the datasets

Datasets and Experiments Our aim is to train a Universal DA tagger using public datasets, but the label spaces across these datasets are not aligned. There- fore, we need a uniﬁed representation of all the acts present across the datasets. We obtain this representation by manu- ally going through the datasets and aligning semantically simi- lar sentences ...

work page
[9]

for DAs. The GSim data has two parts and was collected by generating dialogue ﬂows for movie (GSim-M) and restaurant (GSim-R) booking domains, where the individual turns from simulation in terms of DAs and associated arguments were then converted to natural language by crowd workers. DSTC2 con- tains human-machine interactions collected for the second di-...

work page
[10]

embeddings and ﬁne-tune during training

work page
[11]

Universal DA Schema 5.1. Union of acts based on namespace In order to align the respective acts in the datasets (GSim and DSTC2), we ﬁrst took a union of all the acts based on their names to create a uniﬁed representation. Figure 1 repre- Figure 1: Distribution of system acts across datasets Table 2: Examples of manual alignment of acts in all datasets. G...

work page
[12]

We merge these acts

Mod1: offer/select- I found a show for 7.30 pm/I found shows for 5 pm and 7 pm. We merge these acts

work page
[13]

Mod2: user-request/sys-request - What is the phone number?/What kind of food would you like? We merge these acts

work page
[14]

‘yes, 7pm’ can become afﬁrm, inform(time=7pm) from afﬁrm(time=7pm)

Mod3: afﬁrm(x=y)/afﬁrm + inform(x=y) - afﬁrm with slots is equivalent to separate afﬁrm and inform DAs, for eg. ‘yes, 7pm’ can become afﬁrm, inform(time=7pm) from afﬁrm(time=7pm). We split them

work page
[15]

We merged/split DAs like the aforementioned ones, as they can easily be restored using other information

Mod4: reqalts/reqmore - Is there anything else?/Can i help you with anything else? We merge these acts. We merged/split DAs like the aforementioned ones, as they can easily be restored using other information. For example, if mul- tiple results are offered, we could convert anoffer act to a select act, or depending on the agent, we can convert a request a...

work page
[16]

This ver- sion of the dataset only has DAs for the system turns

DA Tagging of Human-Human Datasets For experimenting with DA annotation of human-human (HH) dialogues, we used MultiWOZ-2.0[13] as our dataset. This ver- sion of the dataset only has DAs for the system turns. To do an evaluation on MultiWOZ-2.0, we ﬁrst need to 1Details in Appendix, Table 7 Table 5: Universal DA schema ack, afﬁrm, bye, deny, inform, repea...

work page
[17]

In this work, we investigated multi- ple annotated human-machine conversation datasets, with dif- ferences in DA schema

Conclusions We are interested in DA tagging of human-human conversations with the ﬁnal goal of end-to-end training of task-oriented di- alogue systems, so that we can generate system actions for a given dialogue context. In this work, we investigated multi- ple annotated human-machine conversation datasets, with dif- ferences in DA schema. We discussed ma...

work page
[18]

J. L. Austin, How to do things with words . Oxford university press, 1975

work page 1975
[19]

Dialogue act modeling for automatic tagging and recognition of conversational speech,

A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Ju- rafsky, P. Taylor, R. Martin, C. V . Ess-Dykema, and M. Meteer, “Dialogue act modeling for automatic tagging and recognition of conversational speech,” Computational linguistics, vol. 26, no. 3, pp. 339–373, 2000

work page 2000
[20]

The hcrc map task corpus,

A. H. Anderson, M. Bader, E. G. Bard, E. Boyle, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Milleret al., “The hcrc map task corpus,” Language and speech, vol. 34, no. 4, pp. 351–366, 1991

work page 1991
[21]

Coding dialogs with the damsl anno- tation scheme,

M. G. Core and J. Allen, “Coding dialogs with the damsl anno- tation scheme,” in AAAI fall symposium on communicative action in humans and machines, vol. 56. Boston, MA, 1997

work page 1997
[22]

The dit++ taxonomy for functional dialogue markup,

H. Bunt, “The dit++ taxonomy for functional dialogue markup,” in AAMAS 2009 Workshop, Towards a Standard Markup Lan- guage for Embodied Dialogue Acts, 2009, pp. 13–24

work page 2009
[23]

ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents

S. Mezza, A. Cervone, G. Tortoreto, E. A. Stepanov, and G. Ric- cardi, “Iso-standard domain-independent dialogue act tagging for conversational agents,”arXiv preprint arXiv:1806.04327, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Cued standard dialogue acts,

S. Young, “Cued standard dialogue acts,” Report, Cambridge Uni- versity Engineering Department, 14th October, vol. 2007, 2007

work page 2007
[25]

Towards an iso standard for dialogue act annotation,

H. Bunt, J. Alexandersson, J. Carletta, J.-W. Choe, A. C. Fang, K. Hasida, K. Lee, V . Petukhova, A. Popescu-Belis, L. Romary et al. , “Towards an iso standard for dialogue act annotation,” in Seventh conference on International Language Resources and Evaluation (LREC’10), 2010

work page 2010
[26]

Bootstrapping a neural conversational agent with dialogue self-play, crowdsourc- ing and on-line reinforcement learning,

P. Shah, D. Hakkani-T ¨ur, B. Liu, and G. Tur, “Bootstrapping a neural conversational agent with dialogue self-play, crowdsourc- ing and on-line reinforcement learning,” in Proceedings of the 2018 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technolo- gies (NAACL/HLT), vol. 3, 2018, pp. 41–51

work page 2018
[27]

Back-off action selection in summary space-based pomdp dialogue systems,

M. Ga ˇsi´c, F. Lefevre, F. Jurˇc´ıˇcek, S. Keizer, F. Mairesse, B. Thom- son, K. Yu, and S. Young, “Back-off action selection in summary space-based pomdp dialogue systems,” in IEEE Workshop on Au- tomatic Speech Recognition & Understanding. IEEE, 2009, pp. 456–461

work page 2009
[28]

Hybrid code networks: practical and efﬁcient end-to-end dialog control with supervised and reinforcement learning,

J. D. Williams, K. Asadi, and G. Zweig, “Hybrid code networks: practical and efﬁcient end-to-end dialog control with supervised and reinforcement learning,” in 55th Annual Meeting of the Asso- ciation for Computational Linguistics (ACL), 2017

work page 2017
[29]

Dialogue learning with human teaching and feedback in end-to-end train- able task-oriented dialogue systems,

B. Liu, G. Tur, D. Hakkani-T ¨ur, P. Shah, and L. Heck, “Dialogue learning with human teaching and feedback in end-to-end train- able task-oriented dialogue systems,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT), 2018

work page 2018
[30]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, . O. Ramadan, and M. Ga ˇsi´c, “Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,” arXiv preprint arXiv:1810.00278, 2018

work page arXiv 2018
[31]

Edina: Building an open domain socialbot with self-dialogues,

B. Krause, M. Damonte, M. Dobre, D. Duma, J. Fainberg, F. Fan- cellu, E. Kahembwe, J. Cheng, and B. Webber, “Edina: Building an open domain socialbot with self-dialogues,” Alexa Prize Pro- ceedings, 2017

work page 2017
[32]

Switchboard: Telephone speech corpus for research and development,

J. J. Godfrey, E. C. Holliman, and J. McDaniel, “Switchboard: Telephone speech corpus for research and development,” in icassp. IEEE, 1992, pp. 517–520

work page 1992
[33]

The second di- alog state tracking challenge,

M. Henderson, B. Thomson, and J. D. Williams, “The second di- alog state tracking challenge,” in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014, pp. 263–272

work page 2014
[34]

The semantics of dialogue acts,

H. Bunt, “The semantics of dialogue acts,” in Proceedings of the Ninth International Conference on Computational Semantics. Association for Computational Linguistics, 2011, pp. 1–13

work page 2011
[35]

Automatic dialog act seg- mentation and classiﬁcation in multiparty meetings,

J. Ang, Y . Liu, and E. Shriberg, “Automatic dialog act seg- mentation and classiﬁcation in multiparty meetings,” in Proceed- ings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. , vol. 1. IEEE, 2005, pp. I–1061

work page 2005
[36]

Joint segmentation and classiﬁcation of dialog acts using conditional random ﬁelds,

M. Zimmermann, “Joint segmentation and classiﬁcation of dialog acts using conditional random ﬁelds,” inTenth Annual Conference of the International Speech Communication Association, 2009

work page 2009
[37]

Dialog act tagging using graphical models,

G. Ji and J. Bilmes, “Dialog act tagging using graphical models,” in Proceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. , vol. 1. IEEE, 2005, pp. I–33

work page 2005
[38]

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks

J. Y . Lee and F. Dernoncourt, “Sequential short-text classiﬁcation with recurrent and convolutional neural networks,”arXiv preprint arXiv:1603.03827, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[39]

Using context information for dialog act classiﬁcation in dnn framework,

Y . Liu, K. Han, Z. Tan, and Y . Lei, “Using context information for dialog act classiﬁcation in dnn framework,” inProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2170–2178

work page 2017
[40]

The icsi meeting recorder dialog act (mrda) corpus,

E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey, “The icsi meeting recorder dialog act (mrda) corpus,” in Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT- NAACL 2004, 2004

work page 2004
[41]

Domain adaptation with unlabeled data for dialog act tagging,

A. Margolis, K. Livescu, and M. Ostendorf, “Domain adaptation with unlabeled data for dialog act tagging,” in Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing. Association for Computational Linguistics, 2010, pp. 45–52

work page 2010
[42]

Enrich- ing word vectors with subword information,

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enrich- ing word vectors with subword information,” Transactions of the Association for Computational Linguistics , vol. 5, pp. 135–146, 2017. A. Appendix Table 7: Alignment of Datasets with Universal DA Schema GSim-R GSim-M DSTC2 Universal DA Schema inform(x=y) inform(x=y) inform(x=y) inform(x=y) reque...

work page 2017

[1] [1]

DAs have been investi- gated by dialogue researchers for many years [2] and multi- ple taxonomies have been proposed [3, 4, 5] (see [6] for a review)

Introduction Dialogue acts (DAs) aim to portray the meaning of utterances at the level of illocutionary force, capturing a speaker’s inten- tion in producing that utterance [1]. DAs have been investi- gated by dialogue researchers for many years [2] and multi- ple taxonomies have been proposed [3, 4, 5] (see [6] for a review). Recent work in task-oriented...

work page

[2] [2]

Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues

Related Work The mismatch between multiple DA taxonomies has been iden- tiﬁed by [6] previously, where a subset of ISO 24617-2 (the in- ternational ISO standard for DA annotation) tags [17] have been identiﬁed and annotations of multiple corpora were mapped to this set, focusing on social conversations. Our work has a simi- lar goal, but focuses on DAs th...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[3] [3]

D = u1, u2, ..., uN and A be the predeﬁned set of M DAs i.e

DA Tagging for Dialogue Systems Let a dialogue D with N turns be denoted as a series of user and system utterances, ui, i.e. D = u1, u2, ..., uN and A be the predeﬁned set of M DAs i.e. A = a1, a2..., aM . Given an utterance ui and its conversation history, DA tagging aims to predict the set of DAs Ai ⊂ A of ui. We use a deep neural network based model fo...

work page

[4] [4]

A bi-directional LSTM to encode each utterance, ui = wi 1, ..., wi ni, where ni denotes the number of tokens in ui and the ﬁnal utterance representation for utterance zi is obtained by concatenating the last hidden layer of the forward LSTM, − − − − →LST M, and the ﬁrst hidden layer of the backward LSTM, ← − − − −LST M: zi = ← − − − −LST M (ui) ⊕ − − − − ...

work page

[5] [5]

A hierarchical, uni-directional LSTM to encode the dia- logue level information, ei: ei = LST M (z1, ...zi−1)

work page

[6] [6]

An indicator number, gi, representing whether the agent is user or the system, i.e., gi = 0 , if uagent i = user, gi = 1 otherwise

work page

[7] [7]

A DA vector is represented as a many-hot vector di of dimension M, where we mark the true DAs as 1

Encoding over past DA(s) pi, where the ﬁnal represen- tation is obtained by concatenating the many-hot repre- sentations of past-DAs. A DA vector is represented as a many-hot vector di of dimension M, where we mark the true DAs as 1. pi = d1 ⊕ d2 ⊕ ... ⊕ di−1 The ﬁnal encoded contextCi is given by: Ci = ei ⊕ gi ⊕ pi (2) Ci is then fed into a feed forward ...

work page

[8] [8]

There- fore, we need a uniﬁed representation of all the acts present across the datasets

Datasets and Experiments Our aim is to train a Universal DA tagger using public datasets, but the label spaces across these datasets are not aligned. There- fore, we need a uniﬁed representation of all the acts present across the datasets. We obtain this representation by manu- ally going through the datasets and aligning semantically simi- lar sentences ...

work page

[9] [9]

for DAs. The GSim data has two parts and was collected by generating dialogue ﬂows for movie (GSim-M) and restaurant (GSim-R) booking domains, where the individual turns from simulation in terms of DAs and associated arguments were then converted to natural language by crowd workers. DSTC2 con- tains human-machine interactions collected for the second di-...

work page

[10] [10]

embeddings and ﬁne-tune during training

work page

[11] [11]

Universal DA Schema 5.1. Union of acts based on namespace In order to align the respective acts in the datasets (GSim and DSTC2), we ﬁrst took a union of all the acts based on their names to create a uniﬁed representation. Figure 1 repre- Figure 1: Distribution of system acts across datasets Table 2: Examples of manual alignment of acts in all datasets. G...

work page

[12] [12]

We merge these acts

Mod1: offer/select- I found a show for 7.30 pm/I found shows for 5 pm and 7 pm. We merge these acts

work page

[13] [13]

Mod2: user-request/sys-request - What is the phone number?/What kind of food would you like? We merge these acts

work page

[14] [14]

‘yes, 7pm’ can become afﬁrm, inform(time=7pm) from afﬁrm(time=7pm)

Mod3: afﬁrm(x=y)/afﬁrm + inform(x=y) - afﬁrm with slots is equivalent to separate afﬁrm and inform DAs, for eg. ‘yes, 7pm’ can become afﬁrm, inform(time=7pm) from afﬁrm(time=7pm). We split them

work page

[15] [15]

We merged/split DAs like the aforementioned ones, as they can easily be restored using other information

Mod4: reqalts/reqmore - Is there anything else?/Can i help you with anything else? We merge these acts. We merged/split DAs like the aforementioned ones, as they can easily be restored using other information. For example, if mul- tiple results are offered, we could convert anoffer act to a select act, or depending on the agent, we can convert a request a...

work page

[16] [16]

This ver- sion of the dataset only has DAs for the system turns

DA Tagging of Human-Human Datasets For experimenting with DA annotation of human-human (HH) dialogues, we used MultiWOZ-2.0[13] as our dataset. This ver- sion of the dataset only has DAs for the system turns. To do an evaluation on MultiWOZ-2.0, we ﬁrst need to 1Details in Appendix, Table 7 Table 5: Universal DA schema ack, afﬁrm, bye, deny, inform, repea...

work page

[17] [17]

In this work, we investigated multi- ple annotated human-machine conversation datasets, with dif- ferences in DA schema

Conclusions We are interested in DA tagging of human-human conversations with the ﬁnal goal of end-to-end training of task-oriented di- alogue systems, so that we can generate system actions for a given dialogue context. In this work, we investigated multi- ple annotated human-machine conversation datasets, with dif- ferences in DA schema. We discussed ma...

work page

[18] [18]

J. L. Austin, How to do things with words . Oxford university press, 1975

work page 1975

[19] [19]

Dialogue act modeling for automatic tagging and recognition of conversational speech,

A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Ju- rafsky, P. Taylor, R. Martin, C. V . Ess-Dykema, and M. Meteer, “Dialogue act modeling for automatic tagging and recognition of conversational speech,” Computational linguistics, vol. 26, no. 3, pp. 339–373, 2000

work page 2000

[20] [20]

The hcrc map task corpus,

A. H. Anderson, M. Bader, E. G. Bard, E. Boyle, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Milleret al., “The hcrc map task corpus,” Language and speech, vol. 34, no. 4, pp. 351–366, 1991

work page 1991

[21] [21]

Coding dialogs with the damsl anno- tation scheme,

M. G. Core and J. Allen, “Coding dialogs with the damsl anno- tation scheme,” in AAAI fall symposium on communicative action in humans and machines, vol. 56. Boston, MA, 1997

work page 1997

[22] [22]

The dit++ taxonomy for functional dialogue markup,

H. Bunt, “The dit++ taxonomy for functional dialogue markup,” in AAMAS 2009 Workshop, Towards a Standard Markup Lan- guage for Embodied Dialogue Acts, 2009, pp. 13–24

work page 2009

[23] [23]

ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents

S. Mezza, A. Cervone, G. Tortoreto, E. A. Stepanov, and G. Ric- cardi, “Iso-standard domain-independent dialogue act tagging for conversational agents,”arXiv preprint arXiv:1806.04327, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

Cued standard dialogue acts,

S. Young, “Cued standard dialogue acts,” Report, Cambridge Uni- versity Engineering Department, 14th October, vol. 2007, 2007

work page 2007

[25] [25]

Towards an iso standard for dialogue act annotation,

H. Bunt, J. Alexandersson, J. Carletta, J.-W. Choe, A. C. Fang, K. Hasida, K. Lee, V . Petukhova, A. Popescu-Belis, L. Romary et al. , “Towards an iso standard for dialogue act annotation,” in Seventh conference on International Language Resources and Evaluation (LREC’10), 2010

work page 2010

[26] [26]

Bootstrapping a neural conversational agent with dialogue self-play, crowdsourc- ing and on-line reinforcement learning,

P. Shah, D. Hakkani-T ¨ur, B. Liu, and G. Tur, “Bootstrapping a neural conversational agent with dialogue self-play, crowdsourc- ing and on-line reinforcement learning,” in Proceedings of the 2018 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technolo- gies (NAACL/HLT), vol. 3, 2018, pp. 41–51

work page 2018

[27] [27]

Back-off action selection in summary space-based pomdp dialogue systems,

M. Ga ˇsi´c, F. Lefevre, F. Jurˇc´ıˇcek, S. Keizer, F. Mairesse, B. Thom- son, K. Yu, and S. Young, “Back-off action selection in summary space-based pomdp dialogue systems,” in IEEE Workshop on Au- tomatic Speech Recognition & Understanding. IEEE, 2009, pp. 456–461

work page 2009

[28] [28]

Hybrid code networks: practical and efﬁcient end-to-end dialog control with supervised and reinforcement learning,

J. D. Williams, K. Asadi, and G. Zweig, “Hybrid code networks: practical and efﬁcient end-to-end dialog control with supervised and reinforcement learning,” in 55th Annual Meeting of the Asso- ciation for Computational Linguistics (ACL), 2017

work page 2017

[29] [29]

Dialogue learning with human teaching and feedback in end-to-end train- able task-oriented dialogue systems,

B. Liu, G. Tur, D. Hakkani-T ¨ur, P. Shah, and L. Heck, “Dialogue learning with human teaching and feedback in end-to-end train- able task-oriented dialogue systems,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT), 2018

work page 2018

[30] [30]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

P. Budzianowski, T.-H. Wen, B.-H. Tseng, I. Casanueva, S. Ultes, . O. Ramadan, and M. Ga ˇsi´c, “Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,” arXiv preprint arXiv:1810.00278, 2018

work page arXiv 2018

[31] [31]

Edina: Building an open domain socialbot with self-dialogues,

B. Krause, M. Damonte, M. Dobre, D. Duma, J. Fainberg, F. Fan- cellu, E. Kahembwe, J. Cheng, and B. Webber, “Edina: Building an open domain socialbot with self-dialogues,” Alexa Prize Pro- ceedings, 2017

work page 2017

[32] [32]

Switchboard: Telephone speech corpus for research and development,

J. J. Godfrey, E. C. Holliman, and J. McDaniel, “Switchboard: Telephone speech corpus for research and development,” in icassp. IEEE, 1992, pp. 517–520

work page 1992

[33] [33]

The second di- alog state tracking challenge,

M. Henderson, B. Thomson, and J. D. Williams, “The second di- alog state tracking challenge,” in Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2014, pp. 263–272

work page 2014

[34] [34]

The semantics of dialogue acts,

H. Bunt, “The semantics of dialogue acts,” in Proceedings of the Ninth International Conference on Computational Semantics. Association for Computational Linguistics, 2011, pp. 1–13

work page 2011

[35] [35]

Automatic dialog act seg- mentation and classiﬁcation in multiparty meetings,

J. Ang, Y . Liu, and E. Shriberg, “Automatic dialog act seg- mentation and classiﬁcation in multiparty meetings,” in Proceed- ings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. , vol. 1. IEEE, 2005, pp. I–1061

work page 2005

[36] [36]

Joint segmentation and classiﬁcation of dialog acts using conditional random ﬁelds,

M. Zimmermann, “Joint segmentation and classiﬁcation of dialog acts using conditional random ﬁelds,” inTenth Annual Conference of the International Speech Communication Association, 2009

work page 2009

[37] [37]

Dialog act tagging using graphical models,

G. Ji and J. Bilmes, “Dialog act tagging using graphical models,” in Proceedings.(ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. , vol. 1. IEEE, 2005, pp. I–33

work page 2005

[38] [38]

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks

J. Y . Lee and F. Dernoncourt, “Sequential short-text classiﬁcation with recurrent and convolutional neural networks,”arXiv preprint arXiv:1603.03827, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[39] [39]

Using context information for dialog act classiﬁcation in dnn framework,

Y . Liu, K. Han, Z. Tan, and Y . Lei, “Using context information for dialog act classiﬁcation in dnn framework,” inProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2170–2178

work page 2017

[40] [40]

The icsi meeting recorder dialog act (mrda) corpus,

E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey, “The icsi meeting recorder dialog act (mrda) corpus,” in Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT- NAACL 2004, 2004

work page 2004

[41] [41]

Domain adaptation with unlabeled data for dialog act tagging,

A. Margolis, K. Livescu, and M. Ostendorf, “Domain adaptation with unlabeled data for dialog act tagging,” in Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing. Association for Computational Linguistics, 2010, pp. 45–52

work page 2010

[42] [42]

Enrich- ing word vectors with subword information,

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enrich- ing word vectors with subword information,” Transactions of the Association for Computational Linguistics , vol. 5, pp. 135–146, 2017. A. Appendix Table 7: Alignment of Datasets with Universal DA Schema GSim-R GSim-M DSTC2 Universal DA Schema inform(x=y) inform(x=y) inform(x=y) inform(x=y) reque...

work page 2017