pith. sign in

arxiv: 2104.05565 · v3 · submitted 2021-04-12 · 💻 cs.CL · cs.AI· cs.LG

Survey on reinforcement learning for language processing

Pith reviewed 2026-05-24 12:51 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords reinforcement learningnatural language processingconversational systemsdialogue systemsdeep reinforcement learningsurvey
0
0 comments X

The pith

Reinforcement learning algorithms are well-suited to solve various natural language processing tasks, especially conversational systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper surveys reinforcement learning applications across natural language processing tasks. It centers on conversational systems due to their rising importance and provides problem descriptions along with explanations of why RL fits each case. The authors examine the benefits and drawbacks of these methods and outline promising future research directions in NLP that could leverage RL.

Core claim

The paper establishes that reinforcement learning methods, including deep neural variants, have been applied to conversational systems and other NLP problems, with detailed accounts of the tasks involved and the reasons RL is appropriate for handling their sequential and reward-based nature.

What carries the argument

Structured review of RL methods applied to NLP, emphasizing why sequential decision-making and long-term reward optimization suit dialogue and language tasks.

If this is right

  • RL enables optimization of long-term dialogue outcomes instead of isolated responses.
  • Deep RL combinations improve performance on complex language interaction tasks.
  • Identified limitations such as training instability point to specific areas for method improvement.
  • Other NLP domains like machine translation could adopt similar RL approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Direct head-to-head benchmarks of different RL algorithms on shared NLP tasks would clarify relative strengths.
  • Combining RL with supervised pretraining might mitigate sample-efficiency issues in language domains.

Load-bearing premise

The reviewed literature and analyses of advantages and limitations accurately capture the state of the art without significant selection bias or omission of key works.

What would settle it

Discovery of multiple major RL-for-NLP papers from the covered period that the survey omits would undermine its claim to review the state of the art.

Figures

Figures reproduced from arXiv: 2104.05565 by Anabel Martin-Gonzalez, Cornelius Weber, Nicolas Navarro-Guerrero, Stefan Wermter, Victor Uc-Cetina.

Figure 1
Figure 1. Figure 1: Schematic view of a reinforcement learning agent designed for language processing. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Grammar G1 with 4 production rules. The language L(G1) generated by grammar G1 is an infinite set of strings. Each of these strings is created by starting with the initial variable S and iteratively selecting and applying one of the production rules in G1, also called substitution rules. For example, the string 0#1 is a valid string belonging to L(G1) and it can be generated by applying the following seque… view at source ↗
Figure 3
Figure 3. Figure 3: Parse tree of string 00#11 generated from grammar [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Schematic view of a reinforcement learning agent designed for syntactic parsing. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Grammar defining valid sentences in English, Grammar adapted from [118]. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Schematic view of a reinforcement learning agent designed for text understanding, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Schematic view of a reinforcement learning agent designed for language generation, [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Schematic view of a reinforcement learning agent designed for language translation. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Sequence-to-sequence RNN architecture for machine translation, adapted [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Information flow of a conversational system. This system receives as input a text [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
read the original abstract

In recent years some researchers have explored the use of reinforcement learning (RL) algorithms as key components in the solution of various natural language processing tasks. For instance, some of these algorithms leveraging deep neural learning have found their way into conversational systems. This paper reviews the state of the art of RL methods for their possible use for different problems of natural language processing, focusing primarily on conversational systems, mainly due to their growing relevance. We provide detailed descriptions of the problems as well as discussions of why RL is well-suited to solve them. Also, we analyze the advantages and limitations of these methods. Finally, we elaborate on promising research directions in natural language processing that might benefit from reinforcement learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The paper surveys reinforcement learning (RL) methods applied to natural language processing (NLP) tasks, with primary emphasis on conversational systems. It describes relevant problem settings, explains the suitability of RL for these tasks, analyzes advantages and limitations of the reviewed approaches, and outlines promising future research directions.

Significance. A balanced survey of this form can usefully consolidate the literature on RL for dialogue and related NLP problems, helping researchers identify established techniques and open questions in the area.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the survey and for recommending acceptance. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; survey paper with no derivations

full rationale

This is a literature survey reviewing prior RL applications to NLP (esp. conversational systems). It describes problem settings, discusses RL suitability, lists advantages/limitations, and suggests directions. No original equations, predictions, fitted parameters, or theorems are advanced, so none of the enumerated circularity patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.) can apply. The central claim reduces to an assertion that the reviewed external literature supports RL suitability, which is not internally circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This survey paper introduces no free parameters, axioms, or invented entities as it does not present original derivations or models.

pith-pipeline@v0.9.0 · 5653 in / 890 out tokens · 35500 ms · 2026-05-24T12:51:48.333525+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

151 extracted references · 151 canonical work pages · 9 internal anchors

  1. [1]

    Antunes, A

    A. Antunes, A. Laflaquiere, T. Ogata, and A. Cangelosi. A Bi-directional Multiple Timescales LSTM Model for Grounding of Actions and Verbs. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages 2614–2621, Macau, China, Nov. 2019. 25

  2. [2]

    Arora, Y

    S. Arora, Y. Liang, and T. Ma. A Simple but Tough-to-Beat Baseline for Sentence Em- beddings. In International Conference on Learning Representations (ICLR) , Toulon, France, Apr. 2017. OpenReview.net

  3. [3]

    Bahdanau, K

    D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations (ICLR) , San Diego, CA, USA, May 2015. arxiv

  4. [4]

    Bengio, O

    S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. In International Conference on Neural Information Pro- cessing Systems (NIPS), volume 1, pages 1171–1179, Montreal, QC, Canada, Dec. 2015. MIT Press

  5. [5]

    Bojanowski, E

    P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics , 5:135–146, Dec. 2017

  6. [6]

    Bothe, S

    C. Bothe, S. Magg, C. Weber, and S. Wermter. Dialogue-Based Neural Learning to Estimate the Sentiment of a Next Upcoming Utterance. In A. Lintas, S. Rovetta, P. F. Verschure, and A. E. Villa, editors, International Conference on Artificial Neural Networks (ICANN) , volume 10614 of Lecture Notes in Computer Science , pages 477–485, Alghero, Italy, Sept

  7. [7]

    Springer International Publishing

  8. [8]

    S. R. K. Branavan, D. Silver, and R. Barzilay. Learning to Win by Reading Manuals in a Monte-Carlo Framework. Journal of Artificial Intelligence Research , 43:661–704, Apr. 2012

  9. [9]

    P. F. Brown, J. Cocke, S. A. D. Pietra, V. J. D. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin. A Statistical Approach to Machine Translation. Computational Linguistics, 16(2):79–85, June 1990

  10. [10]

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amod...

  11. [11]

    Cangelosi and D

    A. Cangelosi and D. Parisi, editors. Simulating the Evolution of Language . Springer-Verlag, London, 2002

  12. [12]

    R. Cao, S. Zhu, C. Liu, J. Li, and K. Yu. Semantic Parsing with Dual Learning. In Annual Meeting of the Association for Computational Linguistics (ACL) , volume 57th, pages 51–64, Florence, Italy, July 2019. Association for Computational Linguistics

  13. [13]

    D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo- Cespedes, S. Yuan, C. Tar, Y.-H. Sung, B. Strope, and R. Kurzweil. Universal Sentence Encoder. arXiv:1803.11175 [cs], Apr. 2018

  14. [14]

    T. Che, Y. Li, R. Zhang, R. D. Hjelm, W. Li, Y. Song, and Y. Bengio. Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. arXiv:1702.07983 [cs], Feb. 2017. 26

  15. [15]

    D. Chen, A. Fisch, J. Weston, and A. Bordes. Reading Wikipedia to Answer Open-Domain Questions. In Annual Meeting of the Association for Computational Linguistics (ACL) , vol- ume 55th, pages 1870–1879, Vancouver, BC, Canada, July 2017. Association for Computa- tional Linguistics

  16. [16]

    L. Chen, R. Yang, C. Chang, Z. Ye, X. Zhou, and K. Yu. On-Line Dialogue Policy Learning with Companion Teaching. In Conference of the European Chapter of the Association for Computational Linguistics (EACL) , volume 15th of Short Papers, pages 198–204, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

  17. [17]

    L. Chen, X. Zhou, C. Chang, R. Yang, and K. Yu. Agent-Aware Dropout DQN for Safe and Efficient on-Line Dialogue Policy Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2454–2464, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics

  18. [18]

    Z. Chen, L. Chen, X. Liu, and K. Yu. Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:2400–2411, 2020

  19. [19]

    K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguis- tics

  20. [20]

    N. Chomsky. On Certain Formal Properties of Grammars. Information and Control, 2(2):137– 167, June 1959

  21. [21]

    N. Chomsky. Aspects of the Theory of Syntax . The MIT Press, Cambridge, Mass, May 1965

  22. [22]

    Conneau, D

    A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 670–680, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics

  23. [23]

    P. A. Crook, S. Keizer, Z. Wang, W. Tang, and O. Lemon. Real User Evaluation of a POMDP Spoken Dialogue System Using Automatic Belief Compression. Computer Speech & Language, 28(4):873–887, July 2014

  24. [24]

    F. Cruz, S. Magg, Y. Nagai, and S. Wermter. Improving Interactive Reinforcement Learning: What Makes a Good Teacher? Connection Science, 30(3):306–325, Mar. 2018

  25. [25]

    F. Cruz, G. I. Parisi, and S. Wermter. Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning. In International Joint Conference on Neural Networks (IJCNN) , pages 1–8, Rio de Janeiro, Brazil, July 2018

  26. [26]

    Cuay´ ahuitl, I

    H. Cuay´ ahuitl, I. Kruijff-Korbayov´ a, and N. Dethlefs. Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots. ACM Transactions on Interactive Intelligent Systems, 4(3):15:1–15:30, Oct. 2014

  27. [27]

    A. Das, S. Kottur, J. M. F. Moura, S. Lee, and D. Batra. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. In IEEE International Conference on Computer Vision (ICCV), pages 2951–2960, Venice, Italy, Oct. 2017. 27

  28. [28]

    R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and A. McCallum. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases Using Reinforcement Learning. In International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 2018

  29. [29]

    Daum´ e III, J

    H. Daum´ e III, J. Langford, and D. Marcu. Search-Based Structured Prediction. Machine Learning, 75(3):297–325, June 2009

  30. [30]

    Y. Deng, X. Guo, N. Zhang, D. Guo, H. Liu, and F. Sun. MQA: Answering the Question via Robotic Manipulation. arXiv:2003.04641 [cs], Dec. 2020

  31. [31]

    Dethlefs and H

    N. Dethlefs and H. Cuay´ ahuitl. Combining Hierarchical Reinforcement Learning and Bayesian Networks for Natural Language Generation in Situated Dialogue. In European Workshop on Natural Language Generation (ENLG), volume 11, pages 110–120, Nancy, France, Sept. 2011. Association for Computational Linguistics

  32. [32]

    Dethlefs and H

    N. Dethlefs and H. Cuay´ ahuitl. Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation. In Annual Meeting of the Asso- ciation for Computational Linguistics: Human Language Technologies (ACL) , volume 49 of Short Papers, pages 654–659, Portland, OR, USA, June 2011. Association for Computational Linguistics

  33. [33]

    Devlin, M.-W

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), pages 4171–4186, Minneapolis, MN, USA, June 2019. Association for Computational Linguistics

  34. [34]

    Devlin, R

    J. Devlin, R. Zbib, Z. Huang, T. Lamar, R. Schwartz, and J. Makhoul. Fast and Robust Neural Network Joint Models for Statistical Machine Translation. In Annual Meeting of the Association for Computational Linguistics (ACL) , volume 52nd, pages 1370–1380, Baltimore, MD, USA, June 2014. Association for Computational Linguistics

  35. [35]

    Eisermann, J

    A. Eisermann, J. H. Lee, C. Weber, and S. Wermter. Generalization in Multimodal Language Learning from Simulation. In International Joint Conference on Neural Networks (IJCNN) , pages 1–8, Shenzhen, China, 2021

  36. [36]

    M. Eppe, P. D. H. Nguyen, and S. Wermter. From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving. Frontiers in Robotics and AI, 6(123), Nov. 2019

  37. [37]

    F¨ ugen, A

    C. F¨ ugen, A. Waibel, and M. Kolss. Simultaneous Translation of Lectures and Speeches. Machine Translation, 21(4):209–252, Dec. 2007

  38. [38]

    J. Gao, M. Galley, and L. Li. Neural Approaches to Conversational AI. In International ACM SIGIR Conference on Research & Development in Information Retrieval , volume 41st, pages 1371–1374, Ann Arbor, MI, USA, June 2018. Association for Computing Machinery

  39. [39]

    Y. Gao, C. Meyer, M. Mesgar, and I. Gurevych. Reward Learning for Efficient Reinforce- ment Learning in Extractive Document Summarisation. In International Joint Conference on Artificial Intelligence (IJCAI) , 19th, pages 2350–2356, Macao, China, 2019. AAAI Press. 28

  40. [40]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS) , volume 27, pages 2672–2680, Montreal, QC, Canada, Dec. 2014. Curran Associates, Inc

  41. [41]

    Grissom II, H

    A. Grissom II, H. He, J. Boyd-Graber, J. Morgan, and H. Daum´ e III. Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1342–1352, Doha, Qatar, Oct. 2014. Association for Computational Linguistics

  42. [42]

    J. Gu, G. Neubig, K. Cho, and V. O. Li. Learning to Translate in Real-Time with Neural Machine Translation. In Conference of the European Chapter of the Association for Com- putational Linguistics (EACL) , volume 15th, pages 1053–1062, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

  43. [43]

    H. Guo. Generating Text with Deep Reinforcement Learning. In NIPS Deep Reinforcement Learning Workshop, Montreal, QC, Canada, 2015

  44. [44]

    J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. Wang. Long Text Generation Via Adver- sarial Training with Leaked Information. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1):5141–5148, Apr. 2018

  45. [45]

    X. Guo, T. Klinger, C. Rosenbaum, J. P. Bigus, M. Campbell, B. Kawas, K. Talamadupula, G. Tesauro, and S. Singh. Learning to Query, Reason, and Answer Questions on Ambiguous Texts. In International Conference on Learning Representations (ICLR) , Toulon, France, Apr. 2017

  46. [46]

    M. B. Hafez, C. Weber, M. Kerzel, and S. Wermter. Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning. Paladyn, Journal of Behavioral Robotics, 10(1):14–29, Jan. 2019

  47. [47]

    M. B. Hafez, C. Weber, M. Kerzel, and S. Wermter. Improving Robot Dual-System Motor Learning with Intrinsically Motivated Meta-Control and Latent-Space Experience Imagina- tion. Robotics and Autonomous Systems , 133:103630, Nov. 2020

  48. [48]

    Achieving Human Parity on Automatic Chinese to English News Translation

    H. Hassan, A. Aue, C. Chen, V. Chowdhary, J. Clark, C. Federmann, X. Huang, M. Junczys- Dowmunt, W. Lewis, M. Li, S. Liu, T.-Y. Liu, R. Luo, A. Menezes, T. Qin, F. Seide, X. Tan, F. Tian, L. Wu, S. Wu, Y. Xia, D. Zhang, Z. Zhang, and M. Zhou. Achieving Human Parity on Automatic Chinese to English News Translation. arXiv:1803.05567 [cs], June 2018

  49. [49]

    D. He, H. Lu, Y. Xia, T. Qin, L. Wang, and T.-Y. Liu. Decoding with Value Networks for Neural Machine Translation. In International Conference on Neural Information Processing Systems (NIPS) , volume 30th, pages 177–186, Long Beach, CA, USA, Dec. 2017. Curran Associates Inc

  50. [50]

    D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T.-Y. Liu, and W.-Y. Ma. Dual Learning for Machine Translation. In Advances in Neural Information Processing Systems (NIPS), volume 29, pages 820–828, Barcelona, Spain, Dec. 2016

  51. [51]

    J. He, J. Chen, X. He, J. Gao, L. Li, L. Deng, and M. Ostendorf. Deep Reinforcement Learning with a Natural Language Action Space. In Annual Meeting of the Association for Computational Linguistics (ACL), volume 54, pages 1621–1630, Berlin, Germany, Aug. 2016. Association for Computational Linguistics. 29

  52. [52]

    J. He, M. Ostendorf, and X. He. Reinforcement Learning with External Knowledge and Two-Stage Q-Functions for Predicting Popular Reddit Threads. arXiv:1704.06217 [cs], Apr. 2017

  53. [53]

    J. He, M. Ostendorf, X. He, J. Chen, J. Gao, L. Li, and L. Deng. Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1838–1848, Austin, TX, USA, Nov. 2016. Association for Computational Linguistics

  54. [54]

    Heinrich, Y

    S. Heinrich, Y. Yao, T. Hinz, Z. Liu, T. Hummel, M. Kerzel, C. Weber, and S. Wermter. Crossmodal Language Grounding in an Embodied Neurocognitive Model. Frontiers in Neu- rorobotics, 14, 2020

  55. [55]

    Henderson, O

    J. Henderson, O. Lemon, and K. Georgila. Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets. Computational Linguistics, 34(4):487–511, July 2008

  56. [56]

    Higashinaka, M

    R. Higashinaka, M. Mizukami, K. Funakoshi, M. Araki, H. Tsukahara, and Y. Kobayashi. Fatal or Not? Finding Errors That Lead to Dialogue Breakdowns in Chat-Oriented Dialogue Systems. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 2243–2248, Lisbon, Portugal, Sept. 2015. Association for Computational Linguistics

  57. [57]

    Hochreiter and J

    S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation , 9(8):1735–1780, Nov. 1997

  58. [58]

    W. J. Hutchins and H. L. Somers. An Introduction to Machine Translation. Academic Press, London, Apr. 1992

  59. [59]

    Jiang, A

    J. Jiang, A. Teichert, J. Eisner, and H. Daum´ e III. Learned Prioritization for Trading Off Ac- curacy and Speed. In Advances in Neural Information Processing Systems (NIPS), volume 25, Lake Tahoe, NV, USA, Dec. 2012

  60. [60]

    Jurcicek, B

    F. Jurcicek, B. Thomson, S. Keizer, F. Mairesse, M. Gasic, K. Yu, and S. J. Young. Natural Belief-Critic: A Reinforcement Algorithm for Parameter Estimation in Statistical Spoken Dia- logue Systems. In Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 90–93, Makuhari, Japan, Sept. 2010

  61. [61]

    Kalchbrenner and P

    N. Kalchbrenner and P. Blunsom. Recurrent Continuous Translation Models. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1700–1709, Seattle, WA, USA, Oct. 2013. Association for Computational Linguistics

  62. [62]

    Keneshloo, T

    Y. Keneshloo, T. Shi, N. Ramakrishnan, and C. K. Reddy. Deep Reinforcement Learning for Sequence-to-Sequence Models. IEEE Transactions on Neural Networks and Learning Systems, 31(7):2469–2489, July 2020

  63. [63]

    Kiros, Y

    R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip- Thought Vectors. In Advances in Neural Information Processing Systems (NIPS), volume 28, pages 3294–3302, Montreal, QC, Canada, 2015. Curran Associates, Inc

  64. [64]

    P. Koehn. Statistical Machine Translation . Cambridge University Press, Cambridge ; New York, Dec. 2009. 30

  65. [65]

    Koehn, F

    P. Koehn, F. J. Och, and D. Marcu. Statistical Phrase-Based Translation. In Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pages 48–54, Edmonton, AB, Canada, May 2003. As- sociation for Computational Linguistics

  66. [66]

    K¨ ubler, R

    S. K¨ ubler, R. McDonald, and J. Nivre. Dependency Parsing. Synthesis Lectures on Human Language Technologies, 2(1):1–127, Dec. 2008

  67. [67]

    Kudashkina, P

    K. Kudashkina, P. M. Pilarski, and R. S. Sutton. Document-Editing Assistants and Model- Based Reinforcement Learning as a Path to Conversational AI. arXiv:2008.12095 [cs], Aug. 2020

  68. [68]

    T. K. Lam, S. Schamoni, and S. Riezler. Interactive-Predictive Neural Machine Translation Through Reinforcement and Imitation. In Proceedings of Machine Translation Summit XVII Volume 1: Research Track, pages 96–106, Dublin, Ireland, Aug. 2019. European Association for Machine Translation

  69. [69]

    Langford and T

    J. Langford and T. Zhang. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits. In Advances in Neural Information Processing Systems (NIPS) , volume 20th, pages 817–824, Vancouver, BC, Canada, 2007. Curran Associates Inc

  70. [70]

    Lˆ e and A

    M. Lˆ e and A. Fokkens. Tackling Error Propagation Through Reinforcement Learning: A Case of Greedy Dependency Parsing. In Conference of the European Chapter of the Association for Computational Linguistics (EACL) , volume 1, pages 677–687, Valencia, Spain, Apr. 2017. Association for Computational Linguistics

  71. [71]

    Le and T

    Q. Le and T. Mikolov. Distributed Representations of Sentences and Documents. In Inter- national Conference on Machine Learning (ICML) , volume 32nd, pages 1188–1196, Beijing, China, June 2014. PMLR

  72. [72]

    LeCun, Y

    Y. LeCun, Y. Bengio, and G. Hinton. Deep Learning. Nature, 521(7553):436–444, May 2015

  73. [73]

    O. Lemon. Learning What to Say and How to Say It: Joint Optimisation of Spoken Dialogue Management and Natural Language Generation. Computer Speech & Language , 25(2):210– 221, Apr. 2011

  74. [74]

    Levin, R

    E. Levin, R. Pieraccini, and W. Eckert. A Stochastic Model of Human-Machine Interaction for Learning Dialog Strategies. IEEE Transactions on Speech and Audio Processing , 8(1):11–23, Jan. 2000

  75. [75]

    J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky. Deep Reinforcement Learning for Dialogue Generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1192–1202, Austin, TX, USA, Nov. 2016. Association for Computational Linguistics

  76. [76]

    L. Li, W. Chu, J. Langford, and R. E. Schapire. A Contextual-Bandit Approach to Per- sonalized News Article Recommendation. In International Conference on World Wide Web (WWW), volume 19th, pages 661–670, Raleigh, NC, USA, Apr. 2010. Association for Com- puting Machinery

  77. [77]

    Li, Y.-N

    X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz. End-to-End Task-Completion Neu- ral Dialogue Systems. In International Joint Conference on Natural Language Processing 31 (IJCNLP), pages 733–743, Taipei, Taiwan, Nov. 2017. Asian Federation of Natural Language Processing

  78. [78]

    X. Li, Z. C. Lipton, B. Dhingra, L. Li, J. Gao, and Y.-N. Chen. A User Simulator for Task-Completion Dialogues. arXiv:1612.05688 [cs], Nov. 2017

  79. [79]

    Z. Li, X. Jiang, L. Shang, and H. Li. Paraphrase Generation with Deep Reinforcement Learning. In Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 3865–3878, Brussels, Belgium, 2018. Association for Computational Linguistics

  80. [80]

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous Control with Deep Reinforcement Learning. arXiv:1509.02971, Sept. 2015

Showing first 80 references.