Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey

Kee Yuan Ngiam; Mengling Feng; Siqi Liu

arxiv: 1907.09475 · v1 · pith:IY4JVZIZnew · submitted 2019-07-22 · 💻 cs.LG · cs.AI· stat.ML

Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey

Siqi Liu , Kee Yuan Ngiam , Mengling Feng This is my paper

Pith reviewed 2026-05-24 17:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords deep reinforcement learningclinical decision supportsurveydeep neural networksmedical applicationsalgorithm selection

0 comments

The pith

The first survey summarizes deep reinforcement learning algorithms for clinical decision support and offers a guide for selecting them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deep reinforcement learning combined with neural networks has human-level performance in games and vision but sees rare use in medicine, and it fills this gap by reviewing the available literature. It covers case studies applying different DRL methods to clinical challenges such as treatment optimization and patient management. The authors compare advantages and limitations across algorithms and supply a preliminary guide for matching algorithms to specific clinical tasks. A reader would care because the survey organizes scattered early applications into one place and gives practitioners a starting point for choosing methods.

Core claim

The authors present the first survey of reinforcement learning algorithms paired with deep neural networks for clinical decision support, including case studies of applications to medical challenges, direct comparisons of algorithm strengths and weaknesses, and a preliminary guide for selecting the right algorithm for a given clinical problem.

What carries the argument

The preliminary guide for choosing the appropriate DRL algorithm for particular clinical applications, which rests on the comparisons of advantages and limitations across reviewed methods and case studies.

If this is right

DRL algorithms can be applied to a range of clinical challenges as shown by the reviewed case studies.
Each DRL algorithm carries distinct advantages and limitations that affect its fit for medical tasks.
The selection guide provides a structured way to match algorithms to clinical needs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

As more DRL clinical applications appear, the survey and guide would need periodic updates to stay current.
The comparisons could help identify which algorithm families deserve priority for further clinical testing.
The rarity of adoption noted in the survey points to open questions about data requirements and safety validation in real medical workflows.

Load-bearing premise

The papers chosen for review accurately represent the current state of DRL use in clinical settings without important omissions or selection bias.

What would settle it

Publication of any earlier survey on DRL for clinical decision support or discovery of multiple significant applications that were omitted from the review.

Figures

Figures reproduced from arXiv: 1907.09475 by Kee Yuan Ngiam, Mengling Feng, Siqi Liu.

**Figure 1.** Figure 1: MDP for policy gradient RL algorithm maximum for Q-function. π 0 (at , st) = ( 1, if at = arg maxat Qπ (st , at), 0, otherwise (4) These two algorithms are called Fitted Value Iteration (FVI)(Munos and Szepesvári, 2008) and Fitted Q Iteration (FQI)(Riedmiller, 2005) respectively. A special case for FQI is called Q-learning(Watkins and Dayan, 1992), wherein step 1 we only take one tuple of sample (st , at ,… view at source ↗

read the original abstract

Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript is a brief survey on deep reinforcement learning (DRL) for clinical decision support. It claims to be the first survey summarizing DRL algorithms with deep neural networks in this domain, reviews relevant algorithms, presents case studies addressing clinical challenges, compares advantages and limitations of the algorithms, and offers a preliminary guide for selecting DRL methods for particular clinical applications.

Significance. If the survey is comprehensive and its novelty claim is substantiated, the work would be significant as an organizing reference for an emerging interdisciplinary area, helping researchers identify suitable DRL approaches for healthcare optimization tasks where data-driven personalization is needed.

major comments (2)

[Abstract] Abstract: The central claim that the paper 'present[s] the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support' is unsupported by any account of the literature search methodology (databases, keywords, inclusion/exclusion criteria, date cutoff, or verification that no prior surveys exist). This is load-bearing for the contribution.
[Case studies / literature review sections] The manuscript provides no discussion of how the reviewed literature and case studies were selected or why they are representative of DRL applications in clinical settings, leaving the weakest_assumption (absence of major selection bias) unaddressed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We agree that the novelty claim and literature selection process require explicit documentation to strengthen the manuscript. We will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the paper 'present[s] the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support' is unsupported by any account of the literature search methodology (databases, keywords, inclusion/exclusion criteria, date cutoff, or verification that no prior surveys exist). This is load-bearing for the contribution.

Authors: We acknowledge that the original manuscript does not describe the literature search process. To substantiate the claim of being the first such survey, the revised version will add a new subsection (e.g., in Section 1 or a dedicated 'Literature Search Methodology' paragraph) that specifies: (1) databases queried (PubMed, IEEE Xplore, Google Scholar, arXiv), (2) search strings (e.g., ('deep reinforcement learning' OR 'deep RL') AND ('clinical decision support' OR 'clinical decision making' OR 'medical decision optimization')), (3) inclusion criteria (peer-reviewed papers or preprints using DNN-based RL for clinical tasks, published up to the submission date), (4) exclusion criteria (purely theoretical RL without clinical application, non-deep methods), and (5) date cutoff. We will also state that, to the best of our knowledge after performing this search, no prior survey comprehensively covered DRL with DNNs specifically for clinical decision support. revision: yes
Referee: [Case studies / literature review sections] The manuscript provides no discussion of how the reviewed literature and case studies were selected or why they are representative of DRL applications in clinical settings, leaving the weakest_assumption (absence of major selection bias) unaddressed.

Authors: We agree that the absence of selection criteria leaves the representativeness open to question. In the revision we will insert a dedicated paragraph (likely in Section 2 or a new 'Scope and Selection of Case Studies' subsection) that explains: (a) the criteria used to choose the reviewed algorithms and case studies (relevance to real clinical challenges, explicit use of deep neural networks within the RL framework, coverage of different medical domains such as ICU, oncology, and chronic disease management, and publication impact), (b) the rationale for why the selected examples are representative (they illustrate the main algorithmic families—value-based, policy-gradient, actor-critic—and the primary clinical hurdles—sparsity of rewards, safety constraints, off-policy evaluation), and (c) an explicit statement acknowledging that the survey is not exhaustive but focuses on illustrative, high-impact applications to provide guidance rather than a complete census. revision: yes

Circularity Check

0 steps flagged

No circularity: survey paper contains no derivations or predictions

full rationale

The paper is a literature survey whose central claim is the presentation of a summary of existing DRL work in clinical settings. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. The 'first survey' assertion is a factual claim of novelty rather than a mathematical reduction to inputs by construction. No self-citations, ansatzes, or renamings of the enumerated kinds are present. The content is self-contained as a descriptive review without internal circular structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; no free parameters, axioms, or invented entities are introduced in the central claim.

pith-pipeline@v0.9.0 · 5676 in / 949 out tokens · 39706 ms · 2026-05-24T17:53:14.553553+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 8 internal anchors

[1]

Asynchronous Advantage Actor-Critic Agent for Starcraft II

Basel Alghanem et al. Asynchronous advantage actor-critic agent for starcraft ii. arXiv preprint arXiv:1807.08217,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

A Brief Survey of Deep Reinforcement Learning

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A brief survey of deep reinforcement learning.arXiv preprint arXiv:1708.05866,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Large-scale machine learning with stochastic gradient descent

Léon Bottou. Large-scale machine learning with stochastic gradient descent. InProceedings of COMPSTAT’2010, pages 177–186. Springer,

work page 2010
[4]

Multi-column Deep Neural Networks for Image Classification

Dan Cireşan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classiﬁcation. arXiv preprint arXiv:1202.2745,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Deep recurrent q-learning for partially observable mdps

Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series,

work page 2015
[6]

Deep Variational Reinforcement Learning for POMDPs

Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, and Shimon Whiteson. Deep variational reinforcement learning for pomdps.arXiv preprint arXiv:1806.02426,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Op- timal and autonomous control using reinforcement learning: A survey.IEEE transactions on neural networks and learning systems, 29(6):2042–2062,

Bahare Kiumarsi, Kyriakos G Vamvoudakis, Hamidreza Modares, and Frank L Lewis. Op- timal and autonomous control using reinforcement learning: A survey.IEEE transactions on neural networks and learning systems, 29(6):2042–2062,

work page 2042
[8]

Data-eﬃcient reinforcement learning in continuous state-action gaussian-pomdps

Rowan McAllister and Carl Edward Rasmussen. Data-eﬃcient reinforcement learning in continuous state-action gaussian-pomdps. InAdvances in Neural Information Processing Systems, pages 2040–2049,

work page 2040
[9]

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Asynchronous methods for deep reinforcement learning

VolodymyrMnih, AdriaPuigdomenechBadia, MehdiMirza, AlexGraves, TimothyLillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928– 1937,

work page 1928
[11]

A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units

Niranjani Prasad, Li-Fang Cheng, Corey Chivers, Michael Draugelis, and Barbara E Engel- hardt. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv preprint arXiv:1704.06300,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Continuous State-Space Models for Optimal Sepsis Treatment - a Deep Reinforcement Learning Approach

Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Continuous state-space models for optimal sepsis treatment-a deep reinforce- ment learning approach.arXiv preprint arXiv:1705.08422,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Overview of the trec 2014 clinical decision support track

Matthew S Simpson, Ellen M Voorhees, and William Hersh. Overview of the trec 2014 clinical decision support track. Technical report, LISTER HILL NATIONAL CENTER FOR BIOMEDICAL COMMUNICATIONS BETHESDA MD,

work page 2014
[14]

Issues in using function approximation for reinforce- ment learning

Sebastian Thrun and Anton Schwartz. Issues in using function approximation for reinforce- ment learning. InProceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum,

work page 1993
[15]

Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. Dueling network architectures for deep reinforcement learning.arXiv preprint arXiv:1511.06581,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Asynchronous Advantage Actor-Critic Agent for Starcraft II

Basel Alghanem et al. Asynchronous advantage actor-critic agent for starcraft ii. arXiv preprint arXiv:1807.08217,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

A Brief Survey of Deep Reinforcement Learning

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A brief survey of deep reinforcement learning.arXiv preprint arXiv:1708.05866,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Large-scale machine learning with stochastic gradient descent

Léon Bottou. Large-scale machine learning with stochastic gradient descent. InProceedings of COMPSTAT’2010, pages 177–186. Springer,

work page 2010

[4] [4]

Multi-column Deep Neural Networks for Image Classification

Dan Cireşan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classiﬁcation. arXiv preprint arXiv:1202.2745,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Deep recurrent q-learning for partially observable mdps

Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series,

work page 2015

[6] [6]

Deep Variational Reinforcement Learning for POMDPs

Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, and Shimon Whiteson. Deep variational reinforcement learning for pomdps.arXiv preprint arXiv:1806.02426,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Op- timal and autonomous control using reinforcement learning: A survey.IEEE transactions on neural networks and learning systems, 29(6):2042–2062,

Bahare Kiumarsi, Kyriakos G Vamvoudakis, Hamidreza Modares, and Frank L Lewis. Op- timal and autonomous control using reinforcement learning: A survey.IEEE transactions on neural networks and learning systems, 29(6):2042–2062,

work page 2042

[8] [8]

Data-eﬃcient reinforcement learning in continuous state-action gaussian-pomdps

Rowan McAllister and Carl Edward Rasmussen. Data-eﬃcient reinforcement learning in continuous state-action gaussian-pomdps. InAdvances in Neural Information Processing Systems, pages 2040–2049,

work page 2040

[9] [9]

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Asynchronous methods for deep reinforcement learning

VolodymyrMnih, AdriaPuigdomenechBadia, MehdiMirza, AlexGraves, TimothyLillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928– 1937,

work page 1928

[11] [11]

A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units

Niranjani Prasad, Li-Fang Cheng, Corey Chivers, Michael Draugelis, and Barbara E Engel- hardt. A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv preprint arXiv:1704.06300,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Continuous State-Space Models for Optimal Sepsis Treatment - a Deep Reinforcement Learning Approach

Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Continuous state-space models for optimal sepsis treatment-a deep reinforce- ment learning approach.arXiv preprint arXiv:1705.08422,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Overview of the trec 2014 clinical decision support track

Matthew S Simpson, Ellen M Voorhees, and William Hersh. Overview of the trec 2014 clinical decision support track. Technical report, LISTER HILL NATIONAL CENTER FOR BIOMEDICAL COMMUNICATIONS BETHESDA MD,

work page 2014

[14] [14]

Issues in using function approximation for reinforce- ment learning

Sebastian Thrun and Anton Schwartz. Issues in using function approximation for reinforce- ment learning. InProceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum,

work page 1993

[15] [15]

Dueling Network Architectures for Deep Reinforcement Learning

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. Dueling network architectures for deep reinforcement learning.arXiv preprint arXiv:1511.06581,

work page internal anchor Pith review Pith/arXiv arXiv