Survey on reinforcement learning for language processing

Anabel Martin-Gonzalez; Cornelius Weber; Nicolas Navarro-Guerrero; Stefan Wermter; Victor Uc-Cetina

arxiv: 2104.05565 · v3 · submitted 2021-04-12 · 💻 cs.CL · cs.AI· cs.LG

Survey on reinforcement learning for language processing

Victor Uc-Cetina , Nicolas Navarro-Guerrero , Anabel Martin-Gonzalez , Cornelius Weber , Stefan Wermter This is my paper

Pith reviewed 2026-05-24 12:51 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords reinforcement learningnatural language processingconversational systemsdialogue systemsdeep reinforcement learningsurvey

0 comments

The pith

Reinforcement learning algorithms are well-suited to solve various natural language processing tasks, especially conversational systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper surveys reinforcement learning applications across natural language processing tasks. It centers on conversational systems due to their rising importance and provides problem descriptions along with explanations of why RL fits each case. The authors examine the benefits and drawbacks of these methods and outline promising future research directions in NLP that could leverage RL.

Core claim

The paper establishes that reinforcement learning methods, including deep neural variants, have been applied to conversational systems and other NLP problems, with detailed accounts of the tasks involved and the reasons RL is appropriate for handling their sequential and reward-based nature.

What carries the argument

Structured review of RL methods applied to NLP, emphasizing why sequential decision-making and long-term reward optimization suit dialogue and language tasks.

If this is right

RL enables optimization of long-term dialogue outcomes instead of isolated responses.
Deep RL combinations improve performance on complex language interaction tasks.
Identified limitations such as training instability point to specific areas for method improvement.
Other NLP domains like machine translation could adopt similar RL approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Direct head-to-head benchmarks of different RL algorithms on shared NLP tasks would clarify relative strengths.
Combining RL with supervised pretraining might mitigate sample-efficiency issues in language domains.

Load-bearing premise

The reviewed literature and analyses of advantages and limitations accurately capture the state of the art without significant selection bias or omission of key works.

What would settle it

Discovery of multiple major RL-for-NLP papers from the covered period that the survey omits would undermine its claim to review the state of the art.

Figures

Figures reproduced from arXiv: 2104.05565 by Anabel Martin-Gonzalez, Cornelius Weber, Nicolas Navarro-Guerrero, Stefan Wermter, Victor Uc-Cetina.

**Figure 2.** Figure 2: Grammar G1 with 4 production rules. The language L(G1) generated by grammar G1 is an infinite set of strings. Each of these strings is created by starting with the initial variable S and iteratively selecting and applying one of the production rules in G1, also called substitution rules. For example, the string 0#1 is a valid string belonging to L(G1) and it can be generated by applying the following seque… view at source ↗

**Figure 3.** Figure 3: Parse tree of string 00#11 generated from grammar [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Schematic view of a reinforcement learning agent designed for syntactic parsing. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Grammar defining valid sentences in English, Grammar adapted from [118]. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Schematic view of a reinforcement learning agent designed for text understanding, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Schematic view of a reinforcement learning agent designed for language generation, [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Schematic view of a reinforcement learning agent designed for language translation. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Sequence-to-sequence RNN architecture for machine translation, adapted [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Information flow of a conversational system. This system receives as input a text [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

read the original abstract

In recent years some researchers have explored the use of reinforcement learning (RL) algorithms as key components in the solution of various natural language processing tasks. For instance, some of these algorithms leveraging deep neural learning have found their way into conversational systems. This paper reviews the state of the art of RL methods for their possible use for different problems of natural language processing, focusing primarily on conversational systems, mainly due to their growing relevance. We provide detailed descriptions of the problems as well as discussions of why RL is well-suited to solve them. Also, we analyze the advantages and limitations of these methods. Finally, we elaborate on promising research directions in natural language processing that might benefit from reinforcement learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The paper surveys reinforcement learning (RL) methods applied to natural language processing (NLP) tasks, with primary emphasis on conversational systems. It describes relevant problem settings, explains the suitability of RL for these tasks, analyzes advantages and limitations of the reviewed approaches, and outlines promising future research directions.

Significance. A balanced survey of this form can usefully consolidate the literature on RL for dialogue and related NLP problems, helping researchers identify established techniques and open questions in the area.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the survey and for recommending acceptance. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; survey paper with no derivations

full rationale

This is a literature survey reviewing prior RL applications to NLP (esp. conversational systems). It describes problem settings, discusses RL suitability, lists advantages/limitations, and suggests directions. No original equations, predictions, fitted parameters, or theorems are advanced, so none of the enumerated circularity patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.) can apply. The central claim reduces to an assertion that the reviewed external literature supports RL suitability, which is not internally circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This survey paper introduces no free parameters, axioms, or invented entities as it does not present original derivations or models.

pith-pipeline@v0.9.0 · 5653 in / 890 out tokens · 35500 ms · 2026-05-24T12:51:48.333525+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

151 extracted references · 151 canonical work pages · 9 internal anchors

[1]

Antunes, A

A. Antunes, A. Laﬂaquiere, T. Ogata, and A. Cangelosi. A Bi-directional Multiple Timescales LSTM Model for Grounding of Actions and Verbs. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 2614–2621, Macau, China, Nov. 2019. 25

work page 2019
[2]

Arora, Y

S. Arora, Y. Liang, and T. Ma. A Simple but Tough-to-Beat Baseline for Sentence Em- beddings. In International Conference on Learning Representations (ICLR) , Toulon, France, Apr. 2017. OpenReview.net

work page 2017
[3]

Bahdanau, K

D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations (ICLR) , San Diego, CA, USA, May 2015. arxiv

work page 2015
[4]

Bengio, O

S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. In International Conference on Neural Information Pro- cessing Systems (NIPS), volume 1, pages 1171–1179, Montreal, QC, Canada, Dec. 2015. MIT Press

work page 2015
[5]

Bojanowski, E

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics , 5:135–146, Dec. 2017

work page 2017
[6]

Bothe, S

C. Bothe, S. Magg, C. Weber, and S. Wermter. Dialogue-Based Neural Learning to Estimate the Sentiment of a Next Upcoming Utterance. In A. Lintas, S. Rovetta, P. F. Verschure, and A. E. Villa, editors, International Conference on Artiﬁcial Neural Networks (ICANN) , volume 10614 of Lecture Notes in Computer Science , pages 477–485, Alghero, Italy, Sept

work page
[7]

Springer International Publishing

work page
[8]

S. R. K. Branavan, D. Silver, and R. Barzilay. Learning to Win by Reading Manuals in a Monte-Carlo Framework. Journal of Artiﬁcial Intelligence Research , 43:661–704, Apr. 2012

work page 2012
[9]

P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Laﬀerty, R. L. Mercer, and P. S. Roossin. A Statistical Approach to Machine Translation. Computational Linguistics, 16(2):79–85, June 1990

work page 1990
[10]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

work page 2020
[11]

Cangelosi and D

A. Cangelosi and D. Parisi, editors. Simulating the Evolution of Language . Springer-Verlag, London, 2002

work page 2002
[12]

R. Cao, S. Zhu, C. Liu, J. Li, and K. Yu. Semantic Parsing with Dual Learning. In Annual Meeting of the Association for Computational Linguistics (ACL) , volume 57th, pages 51–64, Florence, Italy, July 2019. Association for Computational Linguistics

work page 2019
[13]

D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo- Cespedes, S. Yuan, C. Tar, Y.-H. Sung, B. Strope, and R. Kurzweil. Universal Sentence Encoder. arXiv:1803.11175 [cs], Apr. 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

T. Che, Y. Li, R. Zhang, R. D. Hjelm, W. Li, Y. Song, and Y. Bengio. Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. arXiv:1702.07983 [cs], Feb. 2017. 26

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

D. Chen, A. Fisch, J. Weston, and A. Bordes. Reading Wikipedia to Answer Open-Domain Questions. In Annual Meeting of the Association for Computational Linguistics (ACL) , vol- ume 55th, pages 1870–1879, Vancouver, BC, Canada, July 2017. Association for Computa- tional Linguistics

work page 2017
[16]

L. Chen, R. Yang, C. Chang, Z. Ye, X. Zhou, and K. Yu. On-Line Dialogue Policy Learning with Companion Teaching. In Conference of the European Chapter of the Association for Computational Linguistics (EACL) , volume 15th of Short Papers, pages 198–204, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

work page 2017
[17]

L. Chen, X. Zhou, C. Chang, R. Yang, and K. Yu. Agent-Aware Dropout DQN for Safe and Eﬃcient on-Line Dialogue Policy Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2454–2464, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics

work page 2017
[18]

Z. Chen, L. Chen, X. Liu, and K. Yu. Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2400–2411, 2020

work page 2020
[19]

K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguis- tics

work page 2014
[20]

N. Chomsky. On Certain Formal Properties of Grammars. Information and Control, 2(2):137– 167, June 1959

work page 1959
[21]

N. Chomsky. Aspects of the Theory of Syntax . The MIT Press, Cambridge, Mass, May 1965

work page 1965
[22]

Conneau, D

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 670–680, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics

work page 2017
[23]

P. A. Crook, S. Keizer, Z. Wang, W. Tang, and O. Lemon. Real User Evaluation of a POMDP Spoken Dialogue System Using Automatic Belief Compression. Computer Speech & Language, 28(4):873–887, July 2014

work page 2014
[24]

F. Cruz, S. Magg, Y. Nagai, and S. Wermter. Improving Interactive Reinforcement Learning: What Makes a Good Teacher? Connection Science, 30(3):306–325, Mar. 2018

work page 2018
[25]

F. Cruz, G. I. Parisi, and S. Wermter. Multi-modal Feedback for Aﬀordance-driven Interactive Reinforcement Learning. In International Joint Conference on Neural Networks (IJCNN) , pages 1–8, Rio de Janeiro, Brazil, July 2018

work page 2018
[26]

Cuay´ ahuitl, I

H. Cuay´ ahuitl, I. Kruijﬀ-Korbayov´ a, and N. Dethlefs. Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots. ACM Transactions on Interactive Intelligent Systems, 4(3):15:1–15:30, Oct. 2014

work page 2014
[27]

A. Das, S. Kottur, J. M. F. Moura, S. Lee, and D. Batra. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. In IEEE International Conference on Computer Vision (ICCV), pages 2951–2960, Venice, Italy, Oct. 2017. 27

work page 2017
[28]

R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and A. McCallum. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases Using Reinforcement Learning. In International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 2018

work page 2018
[29]

Daum´ e III, J

H. Daum´ e III, J. Langford, and D. Marcu. Search-Based Structured Prediction. Machine Learning, 75(3):297–325, June 2009

work page 2009
[30]

Y. Deng, X. Guo, N. Zhang, D. Guo, H. Liu, and F. Sun. MQA: Answering the Question via Robotic Manipulation. arXiv:2003.04641 [cs], Dec. 2020

work page arXiv 2003
[31]

Dethlefs and H

N. Dethlefs and H. Cuay´ ahuitl. Combining Hierarchical Reinforcement Learning and Bayesian Networks for Natural Language Generation in Situated Dialogue. In European Workshop on Natural Language Generation (ENLG), volume 11, pages 110–120, Nancy, France, Sept. 2011. Association for Computational Linguistics

work page 2011
[32]

Dethlefs and H

N. Dethlefs and H. Cuay´ ahuitl. Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation. In Annual Meeting of the Asso- ciation for Computational Linguistics: Human Language Technologies (ACL) , volume 49 of Short Papers, pages 654–659, Portland, OR, USA, June 2011. Association for Computational Linguistics

work page 2011
[33]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), pages 4171–4186, Minneapolis, MN, USA, June 2019. Association for Computational Linguistics

work page 2019
[34]

Devlin, R

J. Devlin, R. Zbib, Z. Huang, T. Lamar, R. Schwartz, and J. Makhoul. Fast and Robust Neural Network Joint Models for Statistical Machine Translation. In Annual Meeting of the Association for Computational Linguistics (ACL) , volume 52nd, pages 1370–1380, Baltimore, MD, USA, June 2014. Association for Computational Linguistics

work page 2014
[35]

Eisermann, J

A. Eisermann, J. H. Lee, C. Weber, and S. Wermter. Generalization in Multimodal Language Learning from Simulation. In International Joint Conference on Neural Networks (IJCNN) , pages 1–8, Shenzhen, China, 2021

work page 2021
[36]

M. Eppe, P. D. H. Nguyen, and S. Wermter. From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving. Frontiers in Robotics and AI, 6(123), Nov. 2019

work page 2019
[37]

F¨ ugen, A

C. F¨ ugen, A. Waibel, and M. Kolss. Simultaneous Translation of Lectures and Speeches. Machine Translation, 21(4):209–252, Dec. 2007

work page 2007
[38]

J. Gao, M. Galley, and L. Li. Neural Approaches to Conversational AI. In International ACM SIGIR Conference on Research & Development in Information Retrieval , volume 41st, pages 1371–1374, Ann Arbor, MI, USA, June 2018. Association for Computing Machinery

work page 2018
[39]

Y. Gao, C. Meyer, M. Mesgar, and I. Gurevych. Reward Learning for Eﬃcient Reinforce- ment Learning in Extractive Document Summarisation. In International Joint Conference on Artiﬁcial Intelligence (IJCAI) , 19th, pages 2350–2356, Macao, China, 2019. AAAI Press. 28

work page 2019
[40]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS) , volume 27, pages 2672–2680, Montreal, QC, Canada, Dec. 2014. Curran Associates, Inc

work page 2014
[41]

Grissom II, H

A. Grissom II, H. He, J. Boyd-Graber, J. Morgan, and H. Daum´ e III. Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1342–1352, Doha, Qatar, Oct. 2014. Association for Computational Linguistics

work page 2014
[42]

J. Gu, G. Neubig, K. Cho, and V. O. Li. Learning to Translate in Real-Time with Neural Machine Translation. In Conference of the European Chapter of the Association for Com- putational Linguistics (EACL) , volume 15th, pages 1053–1062, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

work page 2017
[43]

H. Guo. Generating Text with Deep Reinforcement Learning. In NIPS Deep Reinforcement Learning Workshop, Montreal, QC, Canada, 2015

work page 2015
[44]

J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. Wang. Long Text Generation Via Adver- sarial Training with Leaked Information. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 32(1):5141–5148, Apr. 2018

work page 2018
[45]

X. Guo, T. Klinger, C. Rosenbaum, J. P. Bigus, M. Campbell, B. Kawas, K. Talamadupula, G. Tesauro, and S. Singh. Learning to Query, Reason, and Answer Questions on Ambiguous Texts. In International Conference on Learning Representations (ICLR) , Toulon, France, Apr. 2017

work page 2017
[46]

M. B. Hafez, C. Weber, M. Kerzel, and S. Wermter. Deep Intrinsically Motivated Continuous Actor-Critic for Eﬃcient Robotic Visuomotor Skill Learning. Paladyn, Journal of Behavioral Robotics, 10(1):14–29, Jan. 2019

work page 2019
[47]

M. B. Hafez, C. Weber, M. Kerzel, and S. Wermter. Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagina- tion. Robotics and Autonomous Systems , 133:103630, Nov. 2020

work page 2020
[48]

Achieving Human Parity on Automatic Chinese to English News Translation

H. Hassan, A. Aue, C. Chen, V. Chowdhary, J. Clark, C. Federmann, X. Huang, M. Junczys- Dowmunt, W. Lewis, M. Li, S. Liu, T.-Y. Liu, R. Luo, A. Menezes, T. Qin, F. Seide, X. Tan, F. Tian, L. Wu, S. Wu, Y. Xia, D. Zhang, Z. Zhang, and M. Zhou. Achieving Human Parity on Automatic Chinese to English News Translation. arXiv:1803.05567 [cs], June 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[49]

D. He, H. Lu, Y. Xia, T. Qin, L. Wang, and T.-Y. Liu. Decoding with Value Networks for Neural Machine Translation. In International Conference on Neural Information Processing Systems (NIPS) , volume 30th, pages 177–186, Long Beach, CA, USA, Dec. 2017. Curran Associates Inc

work page 2017
[50]

D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T.-Y. Liu, and W.-Y. Ma. Dual Learning for Machine Translation. In Advances in Neural Information Processing Systems (NIPS), volume 29, pages 820–828, Barcelona, Spain, Dec. 2016

work page 2016
[51]

J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf. Deep Reinforcement Learning with a Natural Language Action Space. In Annual Meeting of the Association for Computational Linguistics (ACL), volume 54, pages 1621–1630, Berlin, Germany, Aug. 2016. Association for Computational Linguistics. 29

work page 2016
[52]

J. He, M. Ostendorf, and X. He. Reinforcement Learning with External Knowledge and Two-Stage Q-Functions for Predicting Popular Reddit Threads. arXiv:1704.06217 [cs], Apr. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

J. He, M. Ostendorf, X. He, J. Chen, J. Gao, L. Li, and L. Deng. Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1838–1848, Austin, TX, USA, Nov. 2016. Association for Computational Linguistics

work page 2016
[54]

Heinrich, Y

S. Heinrich, Y. Yao, T. Hinz, Z. Liu, T. Hummel, M. Kerzel, C. Weber, and S. Wermter. Crossmodal Language Grounding in an Embodied Neurocognitive Model. Frontiers in Neu- rorobotics, 14, 2020

work page 2020
[55]

Henderson, O

J. Henderson, O. Lemon, and K. Georgila. Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets. Computational Linguistics, 34(4):487–511, July 2008

work page 2008
[56]

Higashinaka, M

R. Higashinaka, M. Mizukami, K. Funakoshi, M. Araki, H. Tsukahara, and Y. Kobayashi. Fatal or Not? Finding Errors That Lead to Dialogue Breakdowns in Chat-Oriented Dialogue Systems. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 2243–2248, Lisbon, Portugal, Sept. 2015. Association for Computational Linguistics

work page 2015
[57]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation , 9(8):1735–1780, Nov. 1997

work page 1997
[58]

W. J. Hutchins and H. L. Somers. An Introduction to Machine Translation. Academic Press, London, Apr. 1992

work page 1992
[59]

Jiang, A

J. Jiang, A. Teichert, J. Eisner, and H. Daum´ e III. Learned Prioritization for Trading Oﬀ Ac- curacy and Speed. In Advances in Neural Information Processing Systems (NIPS), volume 25, Lake Tahoe, NV, USA, Dec. 2012

work page 2012
[60]

Jurcicek, B

F. Jurcicek, B. Thomson, S. Keizer, F. Mairesse, M. Gasic, K. Yu, and S. J. Young. Natural Belief-Critic: A Reinforcement Algorithm for Parameter Estimation in Statistical Spoken Dia- logue Systems. In Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 90–93, Makuhari, Japan, Sept. 2010

work page 2010
[61]

Kalchbrenner and P

N. Kalchbrenner and P. Blunsom. Recurrent Continuous Translation Models. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1700–1709, Seattle, WA, USA, Oct. 2013. Association for Computational Linguistics

work page 2013
[62]

Keneshloo, T

Y. Keneshloo, T. Shi, N. Ramakrishnan, and C. K. Reddy. Deep Reinforcement Learning for Sequence-to-Sequence Models. IEEE Transactions on Neural Networks and Learning Systems, 31(7):2469–2489, July 2020

work page 2020
[63]

Kiros, Y

R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip- Thought Vectors. In Advances in Neural Information Processing Systems (NIPS), volume 28, pages 3294–3302, Montreal, QC, Canada, 2015. Curran Associates, Inc

work page 2015
[64]

P. Koehn. Statistical Machine Translation . Cambridge University Press, Cambridge ; New York, Dec. 2009. 30

work page 2009
[65]

Koehn, F

P. Koehn, F. J. Och, and D. Marcu. Statistical Phrase-Based Translation. In Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pages 48–54, Edmonton, AB, Canada, May 2003. As- sociation for Computational Linguistics

work page 2003
[66]

K¨ ubler, R

S. K¨ ubler, R. McDonald, and J. Nivre. Dependency Parsing. Synthesis Lectures on Human Language Technologies, 2(1):1–127, Dec. 2008

work page 2008
[67]

Kudashkina, P

K. Kudashkina, P. M. Pilarski, and R. S. Sutton. Document-Editing Assistants and Model- Based Reinforcement Learning as a Path to Conversational AI. arXiv:2008.12095 [cs], Aug. 2020

work page arXiv 2008
[68]

T. K. Lam, S. Schamoni, and S. Riezler. Interactive-Predictive Neural Machine Translation Through Reinforcement and Imitation. In Proceedings of Machine Translation Summit XVII Volume 1: Research Track, pages 96–106, Dublin, Ireland, Aug. 2019. European Association for Machine Translation

work page 2019
[69]

Langford and T

J. Langford and T. Zhang. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits. In Advances in Neural Information Processing Systems (NIPS) , volume 20th, pages 817–824, Vancouver, BC, Canada, 2007. Curran Associates Inc

work page 2007
[70]

Lˆ e and A

M. Lˆ e and A. Fokkens. Tackling Error Propagation Through Reinforcement Learning: A Case of Greedy Dependency Parsing. In Conference of the European Chapter of the Association for Computational Linguistics (EACL) , volume 1, pages 677–687, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

work page 2017
[71]

Le and T

Q. Le and T. Mikolov. Distributed Representations of Sentences and Documents. In Inter- national Conference on Machine Learning (ICML) , volume 32nd, pages 1188–1196, Beijing, China, June 2014. PMLR

work page 2014
[72]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton. Deep Learning. Nature, 521(7553):436–444, May 2015

work page 2015
[73]

O. Lemon. Learning What to Say and How to Say It: Joint Optimisation of Spoken Dialogue Management and Natural Language Generation. Computer Speech & Language , 25(2):210– 221, Apr. 2011

work page 2011
[74]

Levin, R

E. Levin, R. Pieraccini, and W. Eckert. A Stochastic Model of Human-Machine Interaction for Learning Dialog Strategies. IEEE Transactions on Speech and Audio Processing , 8(1):11–23, Jan. 2000

work page 2000
[75]

J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky. Deep Reinforcement Learning for Dialogue Generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1192–1202, Austin, TX, USA, Nov. 2016. Association for Computational Linguistics

work page 2016
[76]

L. Li, W. Chu, J. Langford, and R. E. Schapire. A Contextual-Bandit Approach to Per- sonalized News Article Recommendation. In International Conference on World Wide Web (WWW), volume 19th, pages 661–670, Raleigh, NC, USA, Apr. 2010. Association for Com- puting Machinery

work page 2010
[77]

Li, Y.-N

X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz. End-to-End Task-Completion Neu- ral Dialogue Systems. In International Joint Conference on Natural Language Processing 31 (IJCNLP), pages 733–743, Taipei, Taiwan, Nov. 2017. Asian Federation of Natural Language Processing

work page 2017
[78]

X. Li, Z. C. Lipton, B. Dhingra, L. Li, J. Gao, and Y.-N. Chen. A User Simulator for Task-Completion Dialogues. arXiv:1612.05688 [cs], Nov. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[79]

Z. Li, X. Jiang, L. Shang, and H. Li. Paraphrase Generation with Deep Reinforcement Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 3865–3878, Brussels, Belgium, 2018. Association for Computational Linguistics

work page 2018
[80]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous Control with Deep Reinforcement Learning. arXiv:1509.02971, Sept. 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

Showing first 80 references.

[1] [1]

Antunes, A

A. Antunes, A. Laﬂaquiere, T. Ogata, and A. Cangelosi. A Bi-directional Multiple Timescales LSTM Model for Grounding of Actions and Verbs. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 2614–2621, Macau, China, Nov. 2019. 25

work page 2019

[2] [2]

Arora, Y

S. Arora, Y. Liang, and T. Ma. A Simple but Tough-to-Beat Baseline for Sentence Em- beddings. In International Conference on Learning Representations (ICLR) , Toulon, France, Apr. 2017. OpenReview.net

work page 2017

[3] [3]

Bahdanau, K

D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations (ICLR) , San Diego, CA, USA, May 2015. arxiv

work page 2015

[4] [4]

Bengio, O

S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. In International Conference on Neural Information Pro- cessing Systems (NIPS), volume 1, pages 1171–1179, Montreal, QC, Canada, Dec. 2015. MIT Press

work page 2015

[5] [5]

Bojanowski, E

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics , 5:135–146, Dec. 2017

work page 2017

[6] [6]

Bothe, S

C. Bothe, S. Magg, C. Weber, and S. Wermter. Dialogue-Based Neural Learning to Estimate the Sentiment of a Next Upcoming Utterance. In A. Lintas, S. Rovetta, P. F. Verschure, and A. E. Villa, editors, International Conference on Artiﬁcial Neural Networks (ICANN) , volume 10614 of Lecture Notes in Computer Science , pages 477–485, Alghero, Italy, Sept

work page

[7] [7]

Springer International Publishing

work page

[8] [8]

S. R. K. Branavan, D. Silver, and R. Barzilay. Learning to Win by Reading Manuals in a Monte-Carlo Framework. Journal of Artiﬁcial Intelligence Research , 43:661–704, Apr. 2012

work page 2012

[9] [9]

P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Laﬀerty, R. L. Mercer, and P. S. Roossin. A Statistical Approach to Machine Translation. Computational Linguistics, 16(2):79–85, June 1990

work page 1990

[10] [10]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

work page 2020

[11] [11]

Cangelosi and D

A. Cangelosi and D. Parisi, editors. Simulating the Evolution of Language . Springer-Verlag, London, 2002

work page 2002

[12] [12]

R. Cao, S. Zhu, C. Liu, J. Li, and K. Yu. Semantic Parsing with Dual Learning. In Annual Meeting of the Association for Computational Linguistics (ACL) , volume 57th, pages 51–64, Florence, Italy, July 2019. Association for Computational Linguistics

work page 2019

[13] [13]

D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo- Cespedes, S. Yuan, C. Tar, Y.-H. Sung, B. Strope, and R. Kurzweil. Universal Sentence Encoder. arXiv:1803.11175 [cs], Apr. 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

T. Che, Y. Li, R. Zhang, R. D. Hjelm, W. Li, Y. Song, and Y. Bengio. Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. arXiv:1702.07983 [cs], Feb. 2017. 26

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

D. Chen, A. Fisch, J. Weston, and A. Bordes. Reading Wikipedia to Answer Open-Domain Questions. In Annual Meeting of the Association for Computational Linguistics (ACL) , vol- ume 55th, pages 1870–1879, Vancouver, BC, Canada, July 2017. Association for Computa- tional Linguistics

work page 2017

[16] [16]

L. Chen, R. Yang, C. Chang, Z. Ye, X. Zhou, and K. Yu. On-Line Dialogue Policy Learning with Companion Teaching. In Conference of the European Chapter of the Association for Computational Linguistics (EACL) , volume 15th of Short Papers, pages 198–204, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

work page 2017

[17] [17]

L. Chen, X. Zhou, C. Chang, R. Yang, and K. Yu. Agent-Aware Dropout DQN for Safe and Eﬃcient on-Line Dialogue Policy Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2454–2464, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics

work page 2017

[18] [18]

Z. Chen, L. Chen, X. Liu, and K. Yu. Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2400–2411, 2020

work page 2020

[19] [19]

K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguis- tics

work page 2014

[20] [20]

N. Chomsky. On Certain Formal Properties of Grammars. Information and Control, 2(2):137– 167, June 1959

work page 1959

[21] [21]

N. Chomsky. Aspects of the Theory of Syntax . The MIT Press, Cambridge, Mass, May 1965

work page 1965

[22] [22]

Conneau, D

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 670–680, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics

work page 2017

[23] [23]

P. A. Crook, S. Keizer, Z. Wang, W. Tang, and O. Lemon. Real User Evaluation of a POMDP Spoken Dialogue System Using Automatic Belief Compression. Computer Speech & Language, 28(4):873–887, July 2014

work page 2014

[24] [24]

F. Cruz, S. Magg, Y. Nagai, and S. Wermter. Improving Interactive Reinforcement Learning: What Makes a Good Teacher? Connection Science, 30(3):306–325, Mar. 2018

work page 2018

[25] [25]

F. Cruz, G. I. Parisi, and S. Wermter. Multi-modal Feedback for Aﬀordance-driven Interactive Reinforcement Learning. In International Joint Conference on Neural Networks (IJCNN) , pages 1–8, Rio de Janeiro, Brazil, July 2018

work page 2018

[26] [26]

Cuay´ ahuitl, I

H. Cuay´ ahuitl, I. Kruijﬀ-Korbayov´ a, and N. Dethlefs. Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots. ACM Transactions on Interactive Intelligent Systems, 4(3):15:1–15:30, Oct. 2014

work page 2014

[27] [27]

A. Das, S. Kottur, J. M. F. Moura, S. Lee, and D. Batra. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. In IEEE International Conference on Computer Vision (ICCV), pages 2951–2960, Venice, Italy, Oct. 2017. 27

work page 2017

[28] [28]

R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and A. McCallum. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases Using Reinforcement Learning. In International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 2018

work page 2018

[29] [29]

Daum´ e III, J

H. Daum´ e III, J. Langford, and D. Marcu. Search-Based Structured Prediction. Machine Learning, 75(3):297–325, June 2009

work page 2009

[30] [30]

Y. Deng, X. Guo, N. Zhang, D. Guo, H. Liu, and F. Sun. MQA: Answering the Question via Robotic Manipulation. arXiv:2003.04641 [cs], Dec. 2020

work page arXiv 2003

[31] [31]

Dethlefs and H

N. Dethlefs and H. Cuay´ ahuitl. Combining Hierarchical Reinforcement Learning and Bayesian Networks for Natural Language Generation in Situated Dialogue. In European Workshop on Natural Language Generation (ENLG), volume 11, pages 110–120, Nancy, France, Sept. 2011. Association for Computational Linguistics

work page 2011

[32] [32]

Dethlefs and H

N. Dethlefs and H. Cuay´ ahuitl. Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation. In Annual Meeting of the Asso- ciation for Computational Linguistics: Human Language Technologies (ACL) , volume 49 of Short Papers, pages 654–659, Portland, OR, USA, June 2011. Association for Computational Linguistics

work page 2011

[33] [33]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), pages 4171–4186, Minneapolis, MN, USA, June 2019. Association for Computational Linguistics

work page 2019

[34] [34]

Devlin, R

J. Devlin, R. Zbib, Z. Huang, T. Lamar, R. Schwartz, and J. Makhoul. Fast and Robust Neural Network Joint Models for Statistical Machine Translation. In Annual Meeting of the Association for Computational Linguistics (ACL) , volume 52nd, pages 1370–1380, Baltimore, MD, USA, June 2014. Association for Computational Linguistics

work page 2014

[35] [35]

Eisermann, J

A. Eisermann, J. H. Lee, C. Weber, and S. Wermter. Generalization in Multimodal Language Learning from Simulation. In International Joint Conference on Neural Networks (IJCNN) , pages 1–8, Shenzhen, China, 2021

work page 2021

[36] [36]

M. Eppe, P. D. H. Nguyen, and S. Wermter. From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving. Frontiers in Robotics and AI, 6(123), Nov. 2019

work page 2019

[37] [37]

F¨ ugen, A

C. F¨ ugen, A. Waibel, and M. Kolss. Simultaneous Translation of Lectures and Speeches. Machine Translation, 21(4):209–252, Dec. 2007

work page 2007

[38] [38]

J. Gao, M. Galley, and L. Li. Neural Approaches to Conversational AI. In International ACM SIGIR Conference on Research & Development in Information Retrieval , volume 41st, pages 1371–1374, Ann Arbor, MI, USA, June 2018. Association for Computing Machinery

work page 2018

[39] [39]

Y. Gao, C. Meyer, M. Mesgar, and I. Gurevych. Reward Learning for Eﬃcient Reinforce- ment Learning in Extractive Document Summarisation. In International Joint Conference on Artiﬁcial Intelligence (IJCAI) , 19th, pages 2350–2356, Macao, China, 2019. AAAI Press. 28

work page 2019

[40] [40]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS) , volume 27, pages 2672–2680, Montreal, QC, Canada, Dec. 2014. Curran Associates, Inc

work page 2014

[41] [41]

Grissom II, H

A. Grissom II, H. He, J. Boyd-Graber, J. Morgan, and H. Daum´ e III. Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1342–1352, Doha, Qatar, Oct. 2014. Association for Computational Linguistics

work page 2014

[42] [42]

J. Gu, G. Neubig, K. Cho, and V. O. Li. Learning to Translate in Real-Time with Neural Machine Translation. In Conference of the European Chapter of the Association for Com- putational Linguistics (EACL) , volume 15th, pages 1053–1062, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

work page 2017

[43] [43]

H. Guo. Generating Text with Deep Reinforcement Learning. In NIPS Deep Reinforcement Learning Workshop, Montreal, QC, Canada, 2015

work page 2015

[44] [44]

J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. Wang. Long Text Generation Via Adver- sarial Training with Leaked Information. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 32(1):5141–5148, Apr. 2018

work page 2018

[45] [45]

X. Guo, T. Klinger, C. Rosenbaum, J. P. Bigus, M. Campbell, B. Kawas, K. Talamadupula, G. Tesauro, and S. Singh. Learning to Query, Reason, and Answer Questions on Ambiguous Texts. In International Conference on Learning Representations (ICLR) , Toulon, France, Apr. 2017

work page 2017

[46] [46]

M. B. Hafez, C. Weber, M. Kerzel, and S. Wermter. Deep Intrinsically Motivated Continuous Actor-Critic for Eﬃcient Robotic Visuomotor Skill Learning. Paladyn, Journal of Behavioral Robotics, 10(1):14–29, Jan. 2019

work page 2019

[47] [47]

M. B. Hafez, C. Weber, M. Kerzel, and S. Wermter. Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagina- tion. Robotics and Autonomous Systems , 133:103630, Nov. 2020

work page 2020

[48] [48]

Achieving Human Parity on Automatic Chinese to English News Translation

H. Hassan, A. Aue, C. Chen, V. Chowdhary, J. Clark, C. Federmann, X. Huang, M. Junczys- Dowmunt, W. Lewis, M. Li, S. Liu, T.-Y. Liu, R. Luo, A. Menezes, T. Qin, F. Seide, X. Tan, F. Tian, L. Wu, S. Wu, Y. Xia, D. Zhang, Z. Zhang, and M. Zhou. Achieving Human Parity on Automatic Chinese to English News Translation. arXiv:1803.05567 [cs], June 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[49] [49]

D. He, H. Lu, Y. Xia, T. Qin, L. Wang, and T.-Y. Liu. Decoding with Value Networks for Neural Machine Translation. In International Conference on Neural Information Processing Systems (NIPS) , volume 30th, pages 177–186, Long Beach, CA, USA, Dec. 2017. Curran Associates Inc

work page 2017

[50] [50]

D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T.-Y. Liu, and W.-Y. Ma. Dual Learning for Machine Translation. In Advances in Neural Information Processing Systems (NIPS), volume 29, pages 820–828, Barcelona, Spain, Dec. 2016

work page 2016

[51] [51]

J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf. Deep Reinforcement Learning with a Natural Language Action Space. In Annual Meeting of the Association for Computational Linguistics (ACL), volume 54, pages 1621–1630, Berlin, Germany, Aug. 2016. Association for Computational Linguistics. 29

work page 2016

[52] [52]

J. He, M. Ostendorf, and X. He. Reinforcement Learning with External Knowledge and Two-Stage Q-Functions for Predicting Popular Reddit Threads. arXiv:1704.06217 [cs], Apr. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

J. He, M. Ostendorf, X. He, J. Chen, J. Gao, L. Li, and L. Deng. Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1838–1848, Austin, TX, USA, Nov. 2016. Association for Computational Linguistics

work page 2016

[54] [54]

Heinrich, Y

S. Heinrich, Y. Yao, T. Hinz, Z. Liu, T. Hummel, M. Kerzel, C. Weber, and S. Wermter. Crossmodal Language Grounding in an Embodied Neurocognitive Model. Frontiers in Neu- rorobotics, 14, 2020

work page 2020

[55] [55]

Henderson, O

J. Henderson, O. Lemon, and K. Georgila. Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets. Computational Linguistics, 34(4):487–511, July 2008

work page 2008

[56] [56]

Higashinaka, M

R. Higashinaka, M. Mizukami, K. Funakoshi, M. Araki, H. Tsukahara, and Y. Kobayashi. Fatal or Not? Finding Errors That Lead to Dialogue Breakdowns in Chat-Oriented Dialogue Systems. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 2243–2248, Lisbon, Portugal, Sept. 2015. Association for Computational Linguistics

work page 2015

[57] [57]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation , 9(8):1735–1780, Nov. 1997

work page 1997

[58] [58]

W. J. Hutchins and H. L. Somers. An Introduction to Machine Translation. Academic Press, London, Apr. 1992

work page 1992

[59] [59]

Jiang, A

J. Jiang, A. Teichert, J. Eisner, and H. Daum´ e III. Learned Prioritization for Trading Oﬀ Ac- curacy and Speed. In Advances in Neural Information Processing Systems (NIPS), volume 25, Lake Tahoe, NV, USA, Dec. 2012

work page 2012

[60] [60]

Jurcicek, B

F. Jurcicek, B. Thomson, S. Keizer, F. Mairesse, M. Gasic, K. Yu, and S. J. Young. Natural Belief-Critic: A Reinforcement Algorithm for Parameter Estimation in Statistical Spoken Dia- logue Systems. In Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 90–93, Makuhari, Japan, Sept. 2010

work page 2010

[61] [61]

Kalchbrenner and P

N. Kalchbrenner and P. Blunsom. Recurrent Continuous Translation Models. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1700–1709, Seattle, WA, USA, Oct. 2013. Association for Computational Linguistics

work page 2013

[62] [62]

Keneshloo, T

Y. Keneshloo, T. Shi, N. Ramakrishnan, and C. K. Reddy. Deep Reinforcement Learning for Sequence-to-Sequence Models. IEEE Transactions on Neural Networks and Learning Systems, 31(7):2469–2489, July 2020

work page 2020

[63] [63]

Kiros, Y

R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip- Thought Vectors. In Advances in Neural Information Processing Systems (NIPS), volume 28, pages 3294–3302, Montreal, QC, Canada, 2015. Curran Associates, Inc

work page 2015

[64] [64]

P. Koehn. Statistical Machine Translation . Cambridge University Press, Cambridge ; New York, Dec. 2009. 30

work page 2009

[65] [65]

Koehn, F

P. Koehn, F. J. Och, and D. Marcu. Statistical Phrase-Based Translation. In Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pages 48–54, Edmonton, AB, Canada, May 2003. As- sociation for Computational Linguistics

work page 2003

[66] [66]

K¨ ubler, R

S. K¨ ubler, R. McDonald, and J. Nivre. Dependency Parsing. Synthesis Lectures on Human Language Technologies, 2(1):1–127, Dec. 2008

work page 2008

[67] [67]

Kudashkina, P

K. Kudashkina, P. M. Pilarski, and R. S. Sutton. Document-Editing Assistants and Model- Based Reinforcement Learning as a Path to Conversational AI. arXiv:2008.12095 [cs], Aug. 2020

work page arXiv 2008

[68] [68]

T. K. Lam, S. Schamoni, and S. Riezler. Interactive-Predictive Neural Machine Translation Through Reinforcement and Imitation. In Proceedings of Machine Translation Summit XVII Volume 1: Research Track, pages 96–106, Dublin, Ireland, Aug. 2019. European Association for Machine Translation

work page 2019

[69] [69]

Langford and T

J. Langford and T. Zhang. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits. In Advances in Neural Information Processing Systems (NIPS) , volume 20th, pages 817–824, Vancouver, BC, Canada, 2007. Curran Associates Inc

work page 2007

[70] [70]

Lˆ e and A

M. Lˆ e and A. Fokkens. Tackling Error Propagation Through Reinforcement Learning: A Case of Greedy Dependency Parsing. In Conference of the European Chapter of the Association for Computational Linguistics (EACL) , volume 1, pages 677–687, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

work page 2017

[71] [71]

Le and T

Q. Le and T. Mikolov. Distributed Representations of Sentences and Documents. In Inter- national Conference on Machine Learning (ICML) , volume 32nd, pages 1188–1196, Beijing, China, June 2014. PMLR

work page 2014

[72] [72]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton. Deep Learning. Nature, 521(7553):436–444, May 2015

work page 2015

[73] [73]

O. Lemon. Learning What to Say and How to Say It: Joint Optimisation of Spoken Dialogue Management and Natural Language Generation. Computer Speech & Language , 25(2):210– 221, Apr. 2011

work page 2011

[74] [74]

Levin, R

E. Levin, R. Pieraccini, and W. Eckert. A Stochastic Model of Human-Machine Interaction for Learning Dialog Strategies. IEEE Transactions on Speech and Audio Processing , 8(1):11–23, Jan. 2000

work page 2000

[75] [75]

J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky. Deep Reinforcement Learning for Dialogue Generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1192–1202, Austin, TX, USA, Nov. 2016. Association for Computational Linguistics

work page 2016

[76] [76]

L. Li, W. Chu, J. Langford, and R. E. Schapire. A Contextual-Bandit Approach to Per- sonalized News Article Recommendation. In International Conference on World Wide Web (WWW), volume 19th, pages 661–670, Raleigh, NC, USA, Apr. 2010. Association for Com- puting Machinery

work page 2010

[77] [77]

Li, Y.-N

X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz. End-to-End Task-Completion Neu- ral Dialogue Systems. In International Joint Conference on Natural Language Processing 31 (IJCNLP), pages 733–743, Taipei, Taiwan, Nov. 2017. Asian Federation of Natural Language Processing

work page 2017

[78] [78]

X. Li, Z. C. Lipton, B. Dhingra, L. Li, J. Gao, and Y.-N. Chen. A User Simulator for Task-Completion Dialogues. arXiv:1612.05688 [cs], Nov. 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[79] [79]

Z. Li, X. Jiang, L. Shang, and H. Li. Paraphrase Generation with Deep Reinforcement Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 3865–3878, Brussels, Belgium, 2018. Association for Computational Linguistics

work page 2018

[80] [80]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous Control with Deep Reinforcement Learning. arXiv:1509.02971, Sept. 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015