Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Grace Hui Yang; Shihao Wang; Xubo Lin; Yang Deng; Zezhi Deng

arxiv: 2605.14057 · v2 · pith:QLSM76VMnew · submitted 2026-05-13 · 💻 cs.CL

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Xubo Lin , Zezhi Deng , Shihao Wang , Grace Hui Yang , Yang Deng This is my paper

Pith reviewed 2026-05-19 13:43 UTC · model grok-4.3

classification 💻 cs.CL

keywords Inquisitive Conversational AgentsDual Hierarchical Reinforcement LearningLegal Dialogue SystemsProbing QuestionsSupreme Court Oral ArgumentsDialogue Policy LearningInformation Extraction

0 comments

The pith

A dual hierarchical reinforcement learning framework lets agents learn to ask probing questions that emulate judicial patterns and extract key legal information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Most dialogue systems simply respond to user requests, yet many real-world tasks require an agent to actively draw out information to reach its own goals. This paper introduces Inquisitive Conversational Agents and builds one for U.S. Supreme Court oral arguments. It presents a dual hierarchical reinforcement learning setup in which two cooperating agents share the work: one manages high-level dialogue strategy while the other produces specific utterances. The agents learn, through rewards, when and how to pose questions that mirror effective judicial questioning and thereby uncover facts needed for legal objectives. On a Supreme Court dataset the approach beats several baselines on standard performance measures.

Core claim

We propose a Dual Hierarchical Reinforcement Learning framework featuring two cooperating RL agents, each with its own policy, to coordinate strategic dialogue management and fine-grained utterance generation. By learning when and how to ask probing questions, the agent emulates judicial questioning patterns and systematically uncovers crucial information to fulfill its legal objectives.

What carries the argument

Dual Hierarchical Reinforcement Learning framework consisting of two cooperating agents, one for strategic dialogue management and one for fine-grained utterance generation.

If this is right

The learned policies allow the agent to emulate judicial questioning patterns and systematically gather crucial facts.
The method outperforms various baselines across multiple metrics on the U.S. Supreme Court dataset.
The framework marks an initial step toward high-stakes, domain-specific applications that require proactive information extraction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-agent coordination could be tried in medical or financial dialogues where an agent must gather sensitive history without direct user prompting.
Pairing the hierarchical policies with a large pre-trained language model might raise the fluency and relevance of the generated questions.
Live interactive simulations with human lawyers could test whether the learned strategies hold up when the other speaker deviates from training data patterns.

Load-bearing premise

The U.S. Supreme Court oral arguments dataset contains representative examples of effective judicial questioning that the dual RL agents can learn to replicate through reward signals.

What would settle it

A held-out test on Supreme Court dialogues in which the dual-agent system shows no improvement over baselines in metrics that track information uncovered or quality of probing questions.

Figures

Figures reproduced from arXiv: 2605.14057 by Grace Hui Yang, Shihao Wang, Xubo Lin, Yang Deng, Zezhi Deng.

**Figure 2.** Figure 2: System Architecture of the Proposed Dual Hierarchical Inquisitive Conversational Agent. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Coverage Score results [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: MR Score results 5.2 Main Results In this section, we test our method and all baselines on the US Supreme Court dataset and compare their effectiveness in terms of the evaluation metrics. Detailed results are shown in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: A template of google form for manual label [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Justice uses a counterexample to challenge [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Justice continuously pressing attorney by [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Learning Curves from Ablation Study. (a) Cumulative reward during early training stage; (b) Cumulative [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

read the original abstract

Most existing dialogue systems are user-driven, primarily designed to fulfill user requests. However, in many critical real-world scenarios, a conversational agent must proactively extract information to achieve its own objectives rather than merely respond. To address this gap, we introduce Inquisitive Conversational Agents (ICAs) and develop an ICA specifically tailored to U.S. Supreme Court oral arguments. We propose a Dual Hierarchical Reinforcement Learning framework featuring two cooperating RL agents, each with its own policy, to coordinate strategic dialogue management and fine-grained utterance generation. By learning when and how to ask probing questions, the agent emulates judicial questioning patterns and systematically uncovers crucial information to fulfill its legal objectives. Evaluations on a U.S. Supreme Court dataset show that our method outperforms various baselines across multiple metrics. It represents an important first step toward broader high-stakes, domain-specific applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Inquisitive Conversational Agents (ICAs) for proactive information extraction in high-stakes settings, focusing on U.S. Supreme Court oral arguments. It proposes a Dual Hierarchical Reinforcement Learning framework with two cooperating RL agents—one for strategic dialogue management and one for fine-grained utterance generation—to learn when and how to ask probing questions that emulate judicial patterns and uncover crucial information. The central empirical claim is that this approach outperforms various baselines across multiple metrics on a U.S. Supreme Court dataset.

Significance. If the empirical results and reward design hold under scrutiny, the dual hierarchical RL coordination for objective-driven dialogue could advance proactive conversational agents in legal and similar domains. The framework's separation of high-level strategy from utterance generation offers a structured way to handle complex, multi-turn objectives without relying solely on user-driven responses.

major comments (2)

[Abstract] Abstract: The assertion that 'our method outperforms various baselines across multiple metrics' supplies no quantitative results, baseline descriptions, evaluation metrics, statistical significance tests, or controls. This directly undermines verification of the central empirical claim and must be addressed with full experimental details.
[Method] Method (reward design, likely §4): The reward signals used to train the agents on 'systematically uncovers crucial information' are not shown to be definable from raw dialogue turns or generic success metrics alone. If the rewards incorporate hand-crafted legal features, issue coverage heuristics, or expert-derived labels, this contradicts the claim of learning judicial questioning patterns without extensive additional supervision or domain rules.

minor comments (2)

[Abstract] Abstract: Specify the size, preprocessing, and annotation details of the U.S. Supreme Court oral arguments dataset, including how 'crucial information' is operationalized for evaluation.
[Method] Clarify the exact coordination mechanism between the two RL agents (e.g., shared state, hierarchical action selection, or joint optimization) to make the dual-policy architecture reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed review and constructive suggestions for improving our manuscript. Below, we provide point-by-point responses to the major comments. We have revised the manuscript to address the concerns raised where possible.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'our method outperforms various baselines across multiple metrics' supplies no quantitative results, baseline descriptions, evaluation metrics, statistical significance tests, or controls. This directly undermines verification of the central empirical claim and must be addressed with full experimental details.

Authors: The abstract serves as a concise summary and space limitations prevent inclusion of full details. The complete experimental results, including quantitative metrics, baseline descriptions, evaluation metrics, and statistical significance tests, are detailed in Section 5. We will revise the abstract to incorporate key quantitative findings to better support the central claim. revision: yes
Referee: [Method] Method (reward design, likely §4): The reward signals used to train the agents on 'systematically uncovers crucial information' are not shown to be definable from raw dialogue turns or generic success metrics alone. If the rewards incorporate hand-crafted legal features, issue coverage heuristics, or expert-derived labels, this contradicts the claim of learning judicial questioning patterns without extensive additional supervision or domain rules.

Authors: The reward design uses only generic metrics computable from raw dialogue turns, such as the degree to which key information is uncovered in the conversation. No hand-crafted features or expert labels are involved; the dataset provides the basis for evaluating success. We will add a subsection detailing the reward computation to clarify this aspect and resolve any ambiguity. revision: yes

Circularity Check

0 steps flagged

No circularity: independent empirical proposal with no self-referential derivations

full rationale

The paper proposes a Dual Hierarchical Reinforcement Learning framework for Inquisitive Conversational Agents as a novel architecture, with two cooperating RL agents for dialogue management and utterance generation. It evaluates this on a U.S. Supreme Court oral arguments dataset and reports outperformance over baselines. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim is an independent modeling choice and empirical result rather than a quantity derived from its own inputs by construction, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no free parameters, axioms, or invented entities are explicitly detailed. Standard RL assumptions such as reward design are likely present but unspecified.

pith-pipeline@v0.9.0 · 5677 in / 1167 out tokens · 119231 ms · 2026-05-19T13:43:41.872062+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dual Hierarchical Reinforcement Learning framework featuring two cooperating RL agents... reward comprising Solicitation of Goal-Relevant Information, Novelty, Succinct Answer... three-level hierarchical action taxonomy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

206 extracted references · 206 canonical work pages · 21 internal anchors

[1]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Chen, Weizhu , booktitle =. 2022 , url =

work page 2022
[2]

Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

Tiancheng Zhao and Maxine Eskenazi , title =. Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

work page
[3]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

Paweł Budzianowski and Tsung-Hsien Wen and Bo-Hsiang Tseng and Iñigo Casanueva and Stefan Ultes and Osman Ramadan and Milica Gašić , title =. arXiv preprint arXiv:1810.00278 , year =

work page arXiv
[4]

Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

Matthew Henderson and Blaise Thomson and Jason Williams and Steve Young , title =. Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

work page
[5]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Jiwei Li and Will Monroe and Alan Ritter and Michel Galley and Jianfeng Gao and Dan Jurafsky , title =. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

work page 2016
[6]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , year =

Baolin Peng and Xiujun Li and Jianfeng Gao and Jingjing Liu and Kam-Fai Wong and Shang-Yu Su , title =. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , year =

work page
[7]

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

Counterfactual Multi-Agent Policy Gradients , author =. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

work page
[8]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[9]

and Whiteson, Shimon , booktitle =

Rashid, Tabish and Samvelyan, Mikayel and Schroeder de Witt, Christian and Farquhar, Gregory and Foerster, Jakob N. and Whiteson, Shimon , booktitle =. 2018 , publisher =

work page 2018
[10]

, editor =

Henderson, Matthew and Thomson, Blaise and Williams, Jason D. , editor =. The Second Dialog State Tracking Challenge , url =. Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (. doi:10.3115/v1/W14-4337 , eventtitle =

work page doi:10.3115/v1/w14-4337
[11]

, urldate =

Henderson, Matthew and Thomson, Blaise and Williams, Jason D. , urldate =. The third Dialog State Tracking Challenge , isbn =. 2014. doi:10.1109/SLT.2014.7078595 , eventtitle =

work page doi:10.1109/slt.2014.7078595 2014
[12]

The Dialog State Tracking Challenge , url =

Williams, Jason and Raux, Antoine and Ramachandran, Deepak and Black, Alan , editor =. The Dialog State Tracking Challenge , url =. Proceedings of the

work page
[13]

Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , rights =

Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav , urldate =. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , rights =. doi:10.48550/ARXIV.1909.05855 , shorttitle =

work page doi:10.48550/arxiv.1909.05855 1909
[14]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Ganguli, Deep and Lovitt, Liane and Kernion, Jackson and Askell, Amanda and Bai, Yuntao and Kadavath, Saurav and Mann, Ben and Perez, Ethan and Schiefer, Nicholas and Ndousse, Kamal and Jones, Andy and Bowman, Sam and Chen, Anna and Conerly, Tom and. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , rights =. d...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2209.07858
[15]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , rights =. doi:10.48550/ARXIV.2204.05862 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.05862
[16]

Proactive Conversational Agents in the Post-

Liao, Lizi and Yang, Grace Hui and Shah, Chirag , urldate =. Proactive Conversational Agents in the Post-. Proceedings of the 46th International. doi:10.1145/3539618.3594250 , eventtitle =

work page doi:10.1145/3539618.3594250
[17]

Andrew and Abbeel, Pieter and Peters, Jan , urldate =

Osa, Takayuki and Pajarinen, Joni and Neumann, Gerhard and Bagnell, J. Andrew and Abbeel, Pieter and Peters, Jan , urldate =. An Algorithmic Perspective on Imitation Learning , volume =. doi:10.1561/2300000053 , pages =

work page doi:10.1561/2300000053
[18]

Inquisitive mind: a conversational news companion , isbn =

Dubiel, Mateusz and Cervone, Alessandra and Riccardi, Giuseppe , urldate =. Inquisitive mind: a conversational news companion , isbn =. Proceedings of the 1st International Conference on Conversational User Interfaces , publisher =. doi:10.1145/3342775.3342802 , shorttitle =

work page doi:10.1145/3342775.3342802
[19]

Key-Value Retrieval Networks for Task-Oriented Dialogue

Eric, Mihail and Manning, Christopher D. , urldate =. Key-Value Retrieval Networks for Task-Oriented Dialogue , rights =. doi:10.48550/ARXIV.1705.05414 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1705.05414
[20]

A Network-based End-to-End Trainable Task-oriented Dialogue System

Wen, Tsung-Hsien and Vandyke, David and Mrksic, Nikola and Gasic, Milica and Rojas-Barahona, Lina M. and Su, Pei-Hao and Ultes, Stefan and Young, Steve , urldate =. A Network-based End-to-End Trainable Task-oriented Dialogue System , rights =. doi:10.48550/ARXIV.1604.04562 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1604.04562
[21]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

Budzianowski, Paweł and Wen, Tsung-Hsien and Tseng, Bo-Hsiang and Casanueva, Iñigo and Ultes, Stefan and Ramadan, Osman and Gašić, Milica , urldate =. doi:10.48550/ARXIV.1810.00278 , abstract =

work page doi:10.48550/arxiv.1810.00278
[22]

doi:10.48550/ARXIV.2401.01330 , shorttitle =

Aliannejadi, Mohammad and Abbasiantaeb, Zahra and Chatterjee, Shubham and Dalton, Jeffery and Azzopardi, Leif , urldate =. doi:10.48550/ARXIV.2401.01330 , shorttitle =

work page doi:10.48550/arxiv.2401.01330
[23]

Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990 , author =

The. Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990 , author =

work page 1990
[24]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: A Family of Highly Capable Multimodal Models , rights =. doi:10.48550/ARXIV.2312.11805 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.11805
[25]

LLaMA: Open and Efficient Foundation Language Models

Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timothée and Rozière, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume , urldate =. doi:10.48550/ARXIV.2302.13971 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.13971
[26]

doi:10.48550/ARXIV.2303.08774 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774
[27]

doi:10.48550/ARXIV.2304.12026 , shorttitle =

Zhan, Haolan and Li, Zhuang and Wang, Yufei and Luo, Linhao and Feng, Tao and Kang, Xiaoxi and Hua, Yuncheng and Qu, Lizhen and Soon, Lay-Ki and Sharma, Suraj and Zukerman, Ingrid and Semnani-Azad, Zhaleh and Haffari, Gholamreza , urldate =. doi:10.48550/ARXIV.2304.12026 , shorttitle =

work page doi:10.48550/arxiv.2304.12026
[28]

ParlAI: A Dialog Research Software Platform

Miller, Alexander H. and Feng, Will and Fisch, Adam and Lu, Jiasen and Batra, Dhruv and Bordes, Antoine and Parikh, Devi and Weston, Jason , urldate =. doi:10.48550/ARXIV.1705.06476 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1705.06476
[29]

Journal of Public Health , author =

Barriers and facilitators of childhood flu vaccination: the views of parents in. Journal of Public Health , author =. 2022 , pages =. doi:10.1007/s10389-022-01695-2 , abstract =

work page doi:10.1007/s10389-022-01695-2 2022
[30]

Interview

Price, Timothy James , month = mar, year =. Interview. doi:10.25405/data.ncl.14242040.v1 , urldate =

work page doi:10.25405/data.ncl.14242040.v1
[31]

Interview

Lachman, Henry , year =. Interview. doi:10.7910/DVN/GDUOVS , urldate =

work page doi:10.7910/dvn/gduovs
[32]

Interview

Taherzadeh, Oliver , year =. Interview. doi:10.7910/DVN/4C9KFK , urldate =

work page doi:10.7910/dvn/4c9kfk
[33]

Interviews with 10 change leaders leading organizational change , url =

Sadaric, Antonio , year =. Interviews with 10 change leaders leading organizational change , url =. doi:10.7910/DVN/7JYGTG , urldate =

work page doi:10.7910/dvn/7jygtg
[34]

Replication

Tay, Hui Yong , year =. Replication. doi:10.7910/DVN/9X85KL , urldate =

work page doi:10.7910/dvn/9x85kl
[35]

2024 , pages =

Routledge Open Research , author =. 2024 , pages =. doi:10.12688/routledgeopenres.18443.1 , abstract =

work page doi:10.12688/routledgeopenres.18443.1 2024
[36]

doi:10.7910/DVN/WZ0BU5 , urldate =

Girard, Amy , year =. doi:10.7910/DVN/WZ0BU5 , urldate =

work page doi:10.7910/dvn/wz0bu5
[37]

Healthworker

Watkins, David , year =. Healthworker. doi:10.7910/DVN/CYRR9O , urldate =

work page doi:10.7910/dvn/cyrr9o
[38]

Relationship

Policastro, Sara , month = jun, year =. Relationship

work page
[39]

doi:10.48550/ARXIV.2412.10424 , shorttitle =

Kim, Eunsu and Suk, Juyoung and Kim, Seungone and Muennighoff, Niklas and Kim, Dongkwan and Oh, Alice , urldate =. doi:10.48550/ARXIV.2412.10424 , shorttitle =

work page doi:10.48550/arxiv.2412.10424
[40]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , urldate =. Judging. doi:10.48550/ARXIV.2306.05685 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685
[41]

CoQA: A Conversational Question Answering Challenge

Reddy, Siva and Chen, Danqi and Manning, Christopher D. , year =. doi:10.48550/ARXIV.1808.07042 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1808.07042
[42]

Better Zero-Shot Reasoning with Role-Play Prompting , rights =

Kong, Aobo and Zhao, Shiwan and Chen, Hao and Li, Qicheng and Qin, Yong and Sun, Ruiqi and Zhou, Xin and Wang, Enzhi and Dong, Xiaohang , urldate =. Better Zero-Shot Reasoning with Role-Play Prompting , rights =. doi:10.48550/ARXIV.2308.07702 , abstract =

work page doi:10.48550/arxiv.2308.07702
[43]

Transformers: State-of-the-Art Natural Language Processing

Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and Von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[44]

Fitzpatrick, Kathleen Kara and Darcy, Alison and Vierhile, Molly , urldate =. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial , volume =. doi:10.2196/mental.7785 , shorttitle =

work page doi:10.2196/mental.7785
[45]

QuAC : Question Answering in Context

Choi, Eunsol and He, He and Iyyer, Mohit and Yatskar, Mark and Yih, Wen-tau and Choi, Yejin and Liang, Percy and Zettlemoyer, Luke , urldate =. doi:10.48550/arXiv.1808.07036 , shorttitle =. 1808.07036 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1808.07036
[46]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

Bajaj, Payal and Campos, Daniel and Craswell, Nick and Deng, Li and Gao, Jianfeng and Liu, Xiaodong and Majumder, Rangan and. doi:10.48550/arXiv.1611.09268 , shorttitle =. 1611.09268 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.09268
[47]

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications , volume =

Mulla, Nikahat and Gharpure, Prachi , urldate =. Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications , volume =. doi:10.1007/s13748-023-00295-9 , shorttitle =

work page doi:10.1007/s13748-023-00295-9
[48]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning , author=. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=

work page 2022
[49]

A ir C oncierge: Generating Task-Oriented Dialogue via Efficient Large-Scale Knowledge Retrieval

Chen, Chieh-Yang and Wang, Pei-Hsin and Chang, Shih-Chieh and Juan, Da-Cheng and Wei, Wei and Pan, Jia-Yu. A ir C oncierge: Generating Task-Oriented Dialogue via Efficient Large-Scale Knowledge Retrieval. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.79

work page doi:10.18653/v1/2020.findings-emnlp.79 2020
[50]

Neural Approaches to Conversational AI

Gao, Jianfeng and Galley, Michel and Li, Lihong. Neural Approaches to Conversational AI. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 2018. doi:10.18653/v1/P18-5002

work page doi:10.18653/v1/p18-5002 2018
[51]

Learning from Real Users: Rating Dialogue Success with Neural Networks for Reinforcement Learning in Spoken Dialogue Systems , doi =

Su, Pei-Hao and Vandyke, David and Gaši´c, Milica and Kim, Dongho and Mrkši´c, Nikola and Wen, Tsung Hsien and Young, Steve , year =. Learning from Real Users: Rating Dialogue Success with Neural Networks for Reinforcement Learning in Spoken Dialogue Systems , doi =

work page
[52]

, journal=

Young, Steve and Gašić, Milica and Thomson, Blaise and Williams, Jason D. , journal=. POMDP-Based Statistical Spoken Dialog Systems: A Review , year=

work page
[53]

Book Reviews: Spoken Natural Language Dialogue Systems: A Practical Approach

Traum, David R. Book Reviews: Spoken Natural Language Dialogue Systems: A Practical Approach. Computational Linguistics. 1996

work page 1996
[54]

Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking

Campagna, Giovanni and Foryciarz, Agata and Moradshahi, Mehrad and Lam, Monica. Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.12

work page doi:10.18653/v1/2020.acl-main.12 2020
[55]

A Simple Language Model for Task-Oriented Dialogue , url =

Hosseini-Asl, Ehsan and McCann, Bryan and Wu, Chien-Sheng and Yavuz, Semih and Socher, Richard , booktitle =. A Simple Language Model for Task-Oriented Dialogue , url =

work page
[56]

A Neural Conversational Model

Vinyals, Oriol and Le, Quoc , keywords =. A Neural Conversational Model , publisher =. 2015 , copyright =. doi:10.48550/ARXIV.1506.05869 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1506.05869 2015
[57]

O pen D ial KG : Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs

Moon, Seungwhan and Shah, Pararth and Kumar, Anuj and Subba, Rajen. O pen D ial KG : Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1081

work page doi:10.18653/v1/p19-1081 2019
[58]

Augmenting End-to-End Dialog Systems with Commonsense Knowledge

Young, Tom and Cambria, Erik and Chaturvedi, Iti and Huang, Minlie and Zhou, Hao and Biswas, Subham , keywords =. Augmenting End-to-End Dialog Systems with Commonsense Knowledge , publisher =. 2017 , copyright =. doi:10.48550/ARXIV.1709.05453 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1709.05453 2017
[59]

M em2 S eq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems

Madotto, Andrea and Wu, Chien-Sheng and Fung, Pascale. M em2 S eq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1136

work page doi:10.18653/v1/p18-1136 2018
[60]

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

Madotto, Andrea and Cahyawijaya, Samuel and Winata, Genta Indra and Xu, Yan and Liu, Zihan and Lin, Zhaojiang and Fung, Pascale. Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.215

work page doi:10.18653/v1/2020.findings-emnlp.215 2020
[61]

(2019, July)

Aliannejadi, Mohammad and Zamani, Hamed and Crestani, Fabio and Croft, W. Bruce , title =. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2019 , isbn =. doi:10.1145/3331184.3331265 , abstract =

work page doi:10.1145/3331184.3331265 2019
[62]

Bennett, Jianfeng Gao, and Zhiyuan Liu

Yu, Shi and Liu, Jiahua and Yang, Jingqin and Xiong, Chenyan and Bennett, Paul and Gao, Jianfeng and Liu, Zhiyuan , title =. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2020 , isbn =. doi:10.1145/3397271.3401323 , abstract =

work page doi:10.1145/3397271.3401323 2020
[63]

Bruce and Iyyer, Mohit , title =

Qu, Chen and Yang, Liu and Chen, Cen and Qiu, Minghui and Croft, W. Bruce and Iyyer, Mohit , title =. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2020 , isbn =. doi:10.1145/3397271.3401110 , abstract =

work page doi:10.1145/3397271.3401110 2020
[64]

Controlling the Risk of Conversational Search via Reinforcement Learning , publisher =

Wang, Zhenduo and Ai, Qingyao , keywords =. Controlling the Risk of Conversational Search via Reinforcement Learning , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2101.06327 , url =

work page doi:10.48550/arxiv.2101.06327 2021
[65]

Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , publisher =

Zhang, Yichi and Ou, Zhijian and Yu, Zhou , keywords =. Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1911.10484 , url =

work page doi:10.48550/arxiv.1911.10484 2019
[66]

ArXiv , year=

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System , author=. ArXiv , year=

work page
[67]

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models , author=. arXiv preprint arXiv:1902.08858 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1902
[68]

A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems

Asri, Layla El and He, Jing and Suleman, Kaheer , keywords =. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , publisher =. 2016 , copyright =. doi:10.48550/ARXIV.1607.00070 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1607.00070 2016
[69]

High-Quality Diversification for Task-Oriented Dialogue Systems , publisher =

Tang, Zhiwen and Kulkarni, Hrishikesh and Yang, Grace Hui , keywords =. High-Quality Diversification for Task-Oriented Dialogue Systems , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2106.00891 , url =

work page doi:10.48550/arxiv.2106.00891 2021
[70]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page
[71]

2003 , publisher=

Linear regression analysis , author=. 2003 , publisher=

work page 2003
[72]

Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Graves, Alex and Antonoglou, Ioannis and Wierstra, Daan and Riedmiller, Martin , year =

work page
[73]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Deep Reinforcement Learning with Double Q-Learning , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2016 , month=. doi:10.1609/aaai.v30i1.10295 , abstractNote=

work page doi:10.1609/aaai.v30i1.10295 2016
[74]

GitHub repository , howpublished =

Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter , title =. GitHub repository , howpublished =. 2017 , publisher =

work page 2017
[75]

Decoupling Strategy and Generation in Negotiation Dialogues

He, He and Chen, Derek and Balakrishnan, Anusha and Liang, Percy. Decoupling Strategy and Generation in Negotiation Dialogues. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1256

work page doi:10.18653/v1/d18-1256 2018
[76]

Ensemble Learning

Polikar, Robi. Ensemble Learning. Ensemble Machine Learning: Methods and Applications. 2012. doi:10.1007/978-1-4419-9326-7_1

work page doi:10.1007/978-1-4419-9326-7_1 2012
[77]

Proceedings of the 38th International Conference on Machine Learning , pages =

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021
[78]

and van Hasselt, Hado , journal=

Wiering, Marco A. and van Hasselt, Hado , journal=. Ensemble Algorithms in Reinforcement Learning , year=

work page
[79]

2009 , isbn =

Croft, Bruce and Metzler, Donald and Strohman, Trevor , title =. 2009 , isbn =

work page 2009
[80]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Reimers, Nils and Gurevych, Iryna , keywords =. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1908.10084 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 2019

Showing first 80 references.

[1] [1]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Chen, Weizhu , booktitle =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Chen, Weizhu , booktitle =. 2022 , url =

work page 2022

[2] [2]

Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

Tiancheng Zhao and Maxine Eskenazi , title =. Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

work page

[3] [3]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

Paweł Budzianowski and Tsung-Hsien Wen and Bo-Hsiang Tseng and Iñigo Casanueva and Stefan Ultes and Osman Ramadan and Milica Gašić , title =. arXiv preprint arXiv:1810.00278 , year =

work page arXiv

[4] [4]

Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

Matthew Henderson and Blaise Thomson and Jason Williams and Steve Young , title =. Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue , year =

work page

[5] [5]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Jiwei Li and Will Monroe and Alan Ritter and Michel Galley and Jianfeng Gao and Dan Jurafsky , title =. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

work page 2016

[6] [6]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , year =

Baolin Peng and Xiujun Li and Jianfeng Gao and Jingjing Liu and Kam-Fai Wong and Shang-Yu Su , title =. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , year =

work page

[7] [7]

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

Counterfactual Multi-Agent Policy Gradients , author =. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

work page

[8] [8]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[9] [9]

and Whiteson, Shimon , booktitle =

Rashid, Tabish and Samvelyan, Mikayel and Schroeder de Witt, Christian and Farquhar, Gregory and Foerster, Jakob N. and Whiteson, Shimon , booktitle =. 2018 , publisher =

work page 2018

[10] [10]

, editor =

Henderson, Matthew and Thomson, Blaise and Williams, Jason D. , editor =. The Second Dialog State Tracking Challenge , url =. Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (. doi:10.3115/v1/W14-4337 , eventtitle =

work page doi:10.3115/v1/w14-4337

[11] [11]

, urldate =

Henderson, Matthew and Thomson, Blaise and Williams, Jason D. , urldate =. The third Dialog State Tracking Challenge , isbn =. 2014. doi:10.1109/SLT.2014.7078595 , eventtitle =

work page doi:10.1109/slt.2014.7078595 2014

[12] [12]

The Dialog State Tracking Challenge , url =

Williams, Jason and Raux, Antoine and Ramachandran, Deepak and Black, Alan , editor =. The Dialog State Tracking Challenge , url =. Proceedings of the

work page

[13] [13]

Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , rights =

Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav , urldate =. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , rights =. doi:10.48550/ARXIV.1909.05855 , shorttitle =

work page doi:10.48550/arxiv.1909.05855 1909

[14] [14]

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Ganguli, Deep and Lovitt, Liane and Kernion, Jackson and Askell, Amanda and Bai, Yuntao and Kadavath, Saurav and Mann, Ben and Perez, Ethan and Schiefer, Nicholas and Ndousse, Kamal and Jones, Andy and Bowman, Sam and Chen, Anna and Conerly, Tom and. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , rights =. d...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2209.07858

[15] [15]

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , rights =. doi:10.48550/ARXIV.2204.05862 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.05862

[16] [16]

Proactive Conversational Agents in the Post-

Liao, Lizi and Yang, Grace Hui and Shah, Chirag , urldate =. Proactive Conversational Agents in the Post-. Proceedings of the 46th International. doi:10.1145/3539618.3594250 , eventtitle =

work page doi:10.1145/3539618.3594250

[17] [17]

Andrew and Abbeel, Pieter and Peters, Jan , urldate =

Osa, Takayuki and Pajarinen, Joni and Neumann, Gerhard and Bagnell, J. Andrew and Abbeel, Pieter and Peters, Jan , urldate =. An Algorithmic Perspective on Imitation Learning , volume =. doi:10.1561/2300000053 , pages =

work page doi:10.1561/2300000053

[18] [18]

Inquisitive mind: a conversational news companion , isbn =

Dubiel, Mateusz and Cervone, Alessandra and Riccardi, Giuseppe , urldate =. Inquisitive mind: a conversational news companion , isbn =. Proceedings of the 1st International Conference on Conversational User Interfaces , publisher =. doi:10.1145/3342775.3342802 , shorttitle =

work page doi:10.1145/3342775.3342802

[19] [19]

Key-Value Retrieval Networks for Task-Oriented Dialogue

Eric, Mihail and Manning, Christopher D. , urldate =. Key-Value Retrieval Networks for Task-Oriented Dialogue , rights =. doi:10.48550/ARXIV.1705.05414 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1705.05414

[20] [20]

A Network-based End-to-End Trainable Task-oriented Dialogue System

Wen, Tsung-Hsien and Vandyke, David and Mrksic, Nikola and Gasic, Milica and Rojas-Barahona, Lina M. and Su, Pei-Hao and Ultes, Stefan and Young, Steve , urldate =. A Network-based End-to-End Trainable Task-oriented Dialogue System , rights =. doi:10.48550/ARXIV.1604.04562 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1604.04562

[21] [21]

Multiwoz-a large-scale multi- domain wizard-of-oz dataset for task-oriented dialogue mod- elling,

Budzianowski, Paweł and Wen, Tsung-Hsien and Tseng, Bo-Hsiang and Casanueva, Iñigo and Ultes, Stefan and Ramadan, Osman and Gašić, Milica , urldate =. doi:10.48550/ARXIV.1810.00278 , abstract =

work page doi:10.48550/arxiv.1810.00278

[22] [22]

doi:10.48550/ARXIV.2401.01330 , shorttitle =

Aliannejadi, Mohammad and Abbasiantaeb, Zahra and Chatterjee, Shubham and Dalton, Jeffery and Azzopardi, Leif , urldate =. doi:10.48550/ARXIV.2401.01330 , shorttitle =

work page doi:10.48550/arxiv.2401.01330

[23] [23]

Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990 , author =

The. Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990 , author =

work page 1990

[24] [24]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: A Family of Highly Capable Multimodal Models , rights =. doi:10.48550/ARXIV.2312.11805 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.11805

[25] [25]

LLaMA: Open and Efficient Foundation Language Models

Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timothée and Rozière, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume , urldate =. doi:10.48550/ARXIV.2302.13971 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.13971

[26] [26]

doi:10.48550/ARXIV.2303.08774 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774

[27] [27]

doi:10.48550/ARXIV.2304.12026 , shorttitle =

Zhan, Haolan and Li, Zhuang and Wang, Yufei and Luo, Linhao and Feng, Tao and Kang, Xiaoxi and Hua, Yuncheng and Qu, Lizhen and Soon, Lay-Ki and Sharma, Suraj and Zukerman, Ingrid and Semnani-Azad, Zhaleh and Haffari, Gholamreza , urldate =. doi:10.48550/ARXIV.2304.12026 , shorttitle =

work page doi:10.48550/arxiv.2304.12026

[28] [28]

ParlAI: A Dialog Research Software Platform

Miller, Alexander H. and Feng, Will and Fisch, Adam and Lu, Jiasen and Batra, Dhruv and Bordes, Antoine and Parikh, Devi and Weston, Jason , urldate =. doi:10.48550/ARXIV.1705.06476 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1705.06476

[29] [29]

Journal of Public Health , author =

Barriers and facilitators of childhood flu vaccination: the views of parents in. Journal of Public Health , author =. 2022 , pages =. doi:10.1007/s10389-022-01695-2 , abstract =

work page doi:10.1007/s10389-022-01695-2 2022

[30] [30]

Interview

Price, Timothy James , month = mar, year =. Interview. doi:10.25405/data.ncl.14242040.v1 , urldate =

work page doi:10.25405/data.ncl.14242040.v1

[31] [31]

Interview

Lachman, Henry , year =. Interview. doi:10.7910/DVN/GDUOVS , urldate =

work page doi:10.7910/dvn/gduovs

[32] [32]

Interview

Taherzadeh, Oliver , year =. Interview. doi:10.7910/DVN/4C9KFK , urldate =

work page doi:10.7910/dvn/4c9kfk

[33] [33]

Interviews with 10 change leaders leading organizational change , url =

Sadaric, Antonio , year =. Interviews with 10 change leaders leading organizational change , url =. doi:10.7910/DVN/7JYGTG , urldate =

work page doi:10.7910/dvn/7jygtg

[34] [34]

Replication

Tay, Hui Yong , year =. Replication. doi:10.7910/DVN/9X85KL , urldate =

work page doi:10.7910/dvn/9x85kl

[35] [35]

2024 , pages =

Routledge Open Research , author =. 2024 , pages =. doi:10.12688/routledgeopenres.18443.1 , abstract =

work page doi:10.12688/routledgeopenres.18443.1 2024

[36] [36]

doi:10.7910/DVN/WZ0BU5 , urldate =

Girard, Amy , year =. doi:10.7910/DVN/WZ0BU5 , urldate =

work page doi:10.7910/dvn/wz0bu5

[37] [37]

Healthworker

Watkins, David , year =. Healthworker. doi:10.7910/DVN/CYRR9O , urldate =

work page doi:10.7910/dvn/cyrr9o

[38] [38]

Relationship

Policastro, Sara , month = jun, year =. Relationship

work page

[39] [39]

doi:10.48550/ARXIV.2412.10424 , shorttitle =

Kim, Eunsu and Suk, Juyoung and Kim, Seungone and Muennighoff, Niklas and Kim, Dongkwan and Oh, Alice , urldate =. doi:10.48550/ARXIV.2412.10424 , shorttitle =

work page doi:10.48550/arxiv.2412.10424

[40] [40]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , urldate =. Judging. doi:10.48550/ARXIV.2306.05685 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685

[41] [41]

CoQA: A Conversational Question Answering Challenge

Reddy, Siva and Chen, Danqi and Manning, Christopher D. , year =. doi:10.48550/ARXIV.1808.07042 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1808.07042

[42] [42]

Better Zero-Shot Reasoning with Role-Play Prompting , rights =

Kong, Aobo and Zhao, Shiwan and Chen, Hao and Li, Qicheng and Qin, Yong and Sun, Ruiqi and Zhou, Xin and Wang, Enzhi and Dong, Xiaohang , urldate =. Better Zero-Shot Reasoning with Role-Play Prompting , rights =. doi:10.48550/ARXIV.2308.07702 , abstract =

work page doi:10.48550/arxiv.2308.07702

[43] [43]

Transformers: State-of-the-Art Natural Language Processing

Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and Von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020

[44] [44]

Fitzpatrick, Kathleen Kara and Darcy, Alison and Vierhile, Molly , urldate =. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial , volume =. doi:10.2196/mental.7785 , shorttitle =

work page doi:10.2196/mental.7785

[45] [45]

QuAC : Question Answering in Context

Choi, Eunsol and He, He and Iyyer, Mohit and Yatskar, Mark and Yih, Wen-tau and Choi, Yejin and Liang, Percy and Zettlemoyer, Luke , urldate =. doi:10.48550/arXiv.1808.07036 , shorttitle =. 1808.07036 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1808.07036

[46] [46]

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

Bajaj, Payal and Campos, Daniel and Craswell, Nick and Deng, Li and Gao, Jianfeng and Liu, Xiaodong and Majumder, Rangan and. doi:10.48550/arXiv.1611.09268 , shorttitle =. 1611.09268 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.09268

[47] [47]

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications , volume =

Mulla, Nikahat and Gharpure, Prachi , urldate =. Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications , volume =. doi:10.1007/s13748-023-00295-9 , shorttitle =

work page doi:10.1007/s13748-023-00295-9

[48] [48]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning , author=. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , year=

work page 2022

[49] [49]

A ir C oncierge: Generating Task-Oriented Dialogue via Efficient Large-Scale Knowledge Retrieval

Chen, Chieh-Yang and Wang, Pei-Hsin and Chang, Shih-Chieh and Juan, Da-Cheng and Wei, Wei and Pan, Jia-Yu. A ir C oncierge: Generating Task-Oriented Dialogue via Efficient Large-Scale Knowledge Retrieval. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.79

work page doi:10.18653/v1/2020.findings-emnlp.79 2020

[50] [50]

Neural Approaches to Conversational AI

Gao, Jianfeng and Galley, Michel and Li, Lihong. Neural Approaches to Conversational AI. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. 2018. doi:10.18653/v1/P18-5002

work page doi:10.18653/v1/p18-5002 2018

[51] [51]

Learning from Real Users: Rating Dialogue Success with Neural Networks for Reinforcement Learning in Spoken Dialogue Systems , doi =

Su, Pei-Hao and Vandyke, David and Gaši´c, Milica and Kim, Dongho and Mrkši´c, Nikola and Wen, Tsung Hsien and Young, Steve , year =. Learning from Real Users: Rating Dialogue Success with Neural Networks for Reinforcement Learning in Spoken Dialogue Systems , doi =

work page

[52] [52]

, journal=

Young, Steve and Gašić, Milica and Thomson, Blaise and Williams, Jason D. , journal=. POMDP-Based Statistical Spoken Dialog Systems: A Review , year=

work page

[53] [53]

Book Reviews: Spoken Natural Language Dialogue Systems: A Practical Approach

Traum, David R. Book Reviews: Spoken Natural Language Dialogue Systems: A Practical Approach. Computational Linguistics. 1996

work page 1996

[54] [54]

Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking

Campagna, Giovanni and Foryciarz, Agata and Moradshahi, Mehrad and Lam, Monica. Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.12

work page doi:10.18653/v1/2020.acl-main.12 2020

[55] [55]

A Simple Language Model for Task-Oriented Dialogue , url =

Hosseini-Asl, Ehsan and McCann, Bryan and Wu, Chien-Sheng and Yavuz, Semih and Socher, Richard , booktitle =. A Simple Language Model for Task-Oriented Dialogue , url =

work page

[56] [56]

A Neural Conversational Model

Vinyals, Oriol and Le, Quoc , keywords =. A Neural Conversational Model , publisher =. 2015 , copyright =. doi:10.48550/ARXIV.1506.05869 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1506.05869 2015

[57] [57]

O pen D ial KG : Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs

Moon, Seungwhan and Shah, Pararth and Kumar, Anuj and Subba, Rajen. O pen D ial KG : Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1081

work page doi:10.18653/v1/p19-1081 2019

[58] [58]

Augmenting End-to-End Dialog Systems with Commonsense Knowledge

Young, Tom and Cambria, Erik and Chaturvedi, Iti and Huang, Minlie and Zhou, Hao and Biswas, Subham , keywords =. Augmenting End-to-End Dialog Systems with Commonsense Knowledge , publisher =. 2017 , copyright =. doi:10.48550/ARXIV.1709.05453 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1709.05453 2017

[59] [59]

M em2 S eq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems

Madotto, Andrea and Wu, Chien-Sheng and Fung, Pascale. M em2 S eq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1136

work page doi:10.18653/v1/p18-1136 2018

[60] [60]

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

Madotto, Andrea and Cahyawijaya, Samuel and Winata, Genta Indra and Xu, Yan and Liu, Zihan and Lin, Zhaojiang and Fung, Pascale. Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020. doi:10.18653/v1/2020.findings-emnlp.215

work page doi:10.18653/v1/2020.findings-emnlp.215 2020

[61] [61]

(2019, July)

Aliannejadi, Mohammad and Zamani, Hamed and Crestani, Fabio and Croft, W. Bruce , title =. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2019 , isbn =. doi:10.1145/3331184.3331265 , abstract =

work page doi:10.1145/3331184.3331265 2019

[62] [62]

Bennett, Jianfeng Gao, and Zhiyuan Liu

Yu, Shi and Liu, Jiahua and Yang, Jingqin and Xiong, Chenyan and Bennett, Paul and Gao, Jianfeng and Liu, Zhiyuan , title =. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2020 , isbn =. doi:10.1145/3397271.3401323 , abstract =

work page doi:10.1145/3397271.3401323 2020

[63] [63]

Bruce and Iyyer, Mohit , title =

Qu, Chen and Yang, Liu and Chen, Cen and Qiu, Minghui and Croft, W. Bruce and Iyyer, Mohit , title =. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2020 , isbn =. doi:10.1145/3397271.3401110 , abstract =

work page doi:10.1145/3397271.3401110 2020

[64] [64]

Controlling the Risk of Conversational Search via Reinforcement Learning , publisher =

Wang, Zhenduo and Ai, Qingyao , keywords =. Controlling the Risk of Conversational Search via Reinforcement Learning , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2101.06327 , url =

work page doi:10.48550/arxiv.2101.06327 2021

[65] [65]

Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , publisher =

Zhang, Yichi and Ou, Zhijian and Yu, Zhou , keywords =. Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1911.10484 , url =

work page doi:10.48550/arxiv.1911.10484 2019

[66] [66]

ArXiv , year=

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System , author=. ArXiv , year=

work page

[67] [67]

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

Rethinking action spaces for reinforcement learning in end-to-end dialog agents with latent variable models , author=. arXiv preprint arXiv:1902.08858 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1902

[68] [68]

A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems

Asri, Layla El and He, Jing and Suleman, Kaheer , keywords =. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , publisher =. 2016 , copyright =. doi:10.48550/ARXIV.1607.00070 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1607.00070 2016

[69] [69]

High-Quality Diversification for Task-Oriented Dialogue Systems , publisher =

Tang, Zhiwen and Kulkarni, Hrishikesh and Yang, Grace Hui , keywords =. High-Quality Diversification for Task-Oriented Dialogue Systems , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2106.00891 , url =

work page doi:10.48550/arxiv.2106.00891 2021

[70] [70]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page

[71] [71]

2003 , publisher=

Linear regression analysis , author=. 2003 , publisher=

work page 2003

[72] [72]

Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Graves, Alex and Antonoglou, Ioannis and Wierstra, Daan and Riedmiller, Martin , year =

work page

[73] [73]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Deep Reinforcement Learning with Double Q-Learning , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2016 , month=. doi:10.1609/aaai.v30i1.10295 , abstractNote=

work page doi:10.1609/aaai.v30i1.10295 2016

[74] [74]

GitHub repository , howpublished =

Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter , title =. GitHub repository , howpublished =. 2017 , publisher =

work page 2017

[75] [75]

Decoupling Strategy and Generation in Negotiation Dialogues

He, He and Chen, Derek and Balakrishnan, Anusha and Liang, Percy. Decoupling Strategy and Generation in Negotiation Dialogues. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1256

work page doi:10.18653/v1/d18-1256 2018

[76] [76]

Ensemble Learning

Polikar, Robi. Ensemble Learning. Ensemble Machine Learning: Methods and Applications. 2012. doi:10.1007/978-1-4419-9326-7_1

work page doi:10.1007/978-1-4419-9326-7_1 2012

[77] [77]

Proceedings of the 38th International Conference on Machine Learning , pages =

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

work page 2021

[78] [78]

and van Hasselt, Hado , journal=

Wiering, Marco A. and van Hasselt, Hado , journal=. Ensemble Algorithms in Reinforcement Learning , year=

work page

[79] [79]

2009 , isbn =

Croft, Bruce and Metzler, Donald and Strohman, Trevor , title =. 2009 , isbn =

work page 2009

[80] [80]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Reimers, Nils and Gurevych, Iryna , keywords =. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1908.10084 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 2019