pith. sign in

arxiv: 1701.07274 · v6 · pith:SHSAYRXLnew · submitted 2017-01-25 · 💻 cs.LG

Deep Reinforcement Learning: An Overview

classification 💻 cs.LG
keywords learningdeepdiscussincludingreinforcementsystemsapplicationscomputer
0
0 comments X
read the original abstract

We give an overview of recent exciting achievements of deep reinforcement learning (RL). We discuss six core elements, six important mechanisms, and twelve applications. We start with background of machine learning, deep learning and reinforcement learning. Next we discuss core RL elements, including value function, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and exploration. After that, we discuss important mechanisms for RL, including attention and memory, unsupervised learning, transfer learning, multi-agent RL, hierarchical RL, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, natural language processing, including dialogue systems, machine translation, and text generation, computer vision, neural architecture design, business management, finance, healthcare, Industry 4.0, smart grid, intelligent transportation systems, and computer systems. We mention topics not reviewed yet, and list a collection of RL resources. After presenting a brief summary, we close with discussions. Please see Deep Reinforcement Learning, arXiv:1810.06339, for a significant update.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Why Does Agentic Safety Fail to Generalize Across Tasks?

    cs.LG 2026-05 conditional novelty 6.0

    Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstr...

  2. Exploring the Secondary Risks of Large Language Models

    cs.LG 2025-06 unverdicted novelty 6.0

    Introduces secondary risks as a new class of LLM failures from benign prompts, defines two primitives, proposes SecLens search framework, and releases SecRiskBench showing risks are widespread across 16 models.

  3. Anti-bullying Adaptive Cruise Control: A proactive right-of-way protection approach

    eess.SY 2024-12 unverdicted novelty 6.0

    AACC combines online IOC for driving style identification with a Stackelberg game planner to proactively protect right-of-way against cut-ins, reporting up to 79.8% safety gains in simulation.

  4. Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    eess.AS 2024-06 unverdicted novelty 6.0

    Seed-TTS models produce speech matching human naturalness and speaker similarity, with added controllability via self-distillation and reinforcement learning.

  5. A Survey on Vision-Language-Action Models for Embodied AI

    cs.RO 2024-05 unverdicted novelty 6.0

    This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

  6. An Inductive Synthesis Framework for Verifiable Reinforcement Learning

    cs.LG 2019-07 unverdicted novelty 6.0

    The paper introduces an inductive synthesis framework that generates verifiable deterministic program approximations of neural RL policies, preserving safety invariants via counterexample-guided search over state tran...

  7. CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations

    cs.AI 2026-04 unverdicted novelty 5.0

    Cognitive models of user reasoning strategies with XAI methods on tabular data fit human forward-simulation decisions better than ML baselines and support hypothesis testing without new user studies.

  8. Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

    cs.CL 2025-10 unverdicted novelty 5.0

    ERL trains LLMs to erase faulty reasoning steps and regenerate them in place, yielding gains of up to 8.48% EM on multi-hop QA benchmarks like HotpotQA.

  9. The Rise and Potential of Large Language Model Based Agents: A Survey

    cs.AI 2023-09 accept novelty 4.0

    The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

  10. Deep Reinforcement Learning for Personalized Search Story Recommendation

    cs.LG 2019-07 unverdicted novelty 3.0

    A deep RL architecture using imitation learning and reinforcement learning is proposed to model immediate and future values of search story recommendations in a Markov decision process framework.