pith. sign in

arxiv: 2601.23154 · v2 · pith:TTYETDUVnew · submitted 2026-01-30 · 💻 cs.LG · cs.AI

On Safer Reinforcement Learning for Sedation and Analgesia in Intensive Care

Pith reviewed 2026-05-21 13:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords reinforcement learningintensive caresedationanalgesiamortalityoffline RLMIMIC-IVpatient safety
0
0 comments X

The pith

A reinforcement learning policy for ICU sedation that also targets lower mortality aligns with better survival than one focused only on pain reduction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors train offline deep reinforcement learning models on over 47,000 ICU stays to recommend hourly doses of opioids, propofol, benzodiazepines, and dexmedetomidine. One policy minimizes pain scores while the other jointly minimizes pain and 30-day post-discharge mortality, using recurrent networks to handle partial observability of patient state. Clinician actions that match the pain-only policy correlate with higher mortality, but actions that match the joint policy correlate with lower mortality, with the split driven by how each policy treats patients who have many coexisting conditions. A sympathetic reader would care because the result indicates that omitting survival from the objective can produce treatment suggestions whose real-world use tracks with worse outcomes.

Core claim

Using behavior-regularized actor-critic methods on MIMIC-IV data, the pain-only policy shows positive correlation between clinician agreement and mortality (ρ=0.119), while the joint pain-and-mortality policy shows negative correlation (ρ=-0.316); the difference arises because the two policies respond differently to high comorbidity levels.

What carries the argument

Offline behavior-regularized actor-critic model with recurrent state representations that outputs continuous medication doses, trained either to reduce pain alone or to reduce both pain and 30-day mortality.

If this is right

  • Including mortality in the reward changes how the policy responds to patients with high comorbidity burdens.
  • Clinician-policy agreement can serve as a retrospective proxy for safety evaluation in offline medical RL.
  • Short-term pain minimization alone can produce policies whose alignment with clinicians tracks higher mortality.
  • Joint optimization of immediate comfort and post-discharge survival may improve overall policy alignment with better outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar multi-objective training could be tested in other time-critical ICU decisions such as ventilator weaning.
  • The observed divergence suggests that pain scores alone may be an incomplete signal for safe policy learning in critical care.
  • Extending the framework to include additional long-term outcomes like organ failure rates would be a natural next measurement.

Load-bearing premise

Retrospective clinician actions recorded in the MIMIC-IV cohort form a sufficiently rich and unbiased behavioral prior so that agreement with a learned policy can be read as evidence of policy safety.

What would settle it

A prospective randomized trial that measures 30-day mortality among patients whose care follows the joint policy versus the pain-only policy or usual clinician practice.

Figures

Figures reproduced from arXiv: 2601.23154 by Joel Romero-Hernandez, Oscar Camara.

Figure 1
Figure 1. Figure 1: Schematic of our data-driven reinforcement learning (RL) framework [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our dataset included both quantitative and ordinal observations, apart [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 6
Figure 6. Figure 6: Policy B displays an overall decreasing trend in the association [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 4
Figure 4. Figure 4: Clinician-agent agreement for policies A ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Elixhauser Comorbidity Index (ECI) distribution (left) and correlations [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Correlations between Elixhauser Comorbidity Index and signed [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Oral morphine-equivalent (OME) dosing for a test patient with an [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Panels correspond to policy A (left) and policy B (right). Points [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗
read the original abstract

Pain management in intensive care usually involves complex trade-offs, since both inadequate and excessive treatment can compromise patient safety. Prior work on reinforcement learning for sedation and analgesia has explored how to optimize these interventions, but has not considered patient survival or partial observability. To investigate the risks of these design choices, we developed an offline deep reinforcement learning framework that suggests hourly medication doses based on recurrent state representations. Using retrospective data from 47,144 ICU stays in the MIMIC-IV database, we trained and evaluated behavior-regularized actor-critic models that prescribe continuous doses of opioids, propofol, benzodiazepines, and dexmedetomidine according to two goals: reduce pain or jointly reduce pain and 30-day post-discharge mortality. Although the two resulting policies were associated with lower pain, clinician agreement with the pain-only policy was positively correlated with mortality ($\rho$=0.119, p<0.0001), while agreement with the joint policy was negatively correlated ($\rho$=-0.316, p<0.0001). We found that such divergence arose from a different response to high levels of comorbidity. This suggests that valuing post-discharge outcomes could be critical for learning safer treatment policies, even if a short-term goal remains the primary objective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops an offline deep reinforcement learning framework using behavior-regularized actor-critic models on 47,144 MIMIC-IV ICU stays to prescribe continuous doses of opioids, propofol, benzodiazepines, and dexmedetomidine. It compares a pain-reduction policy against a joint pain-and-30-day-mortality policy, both using recurrent state representations to handle partial observability. The central empirical finding is that clinician agreement with the pain-only policy correlates positively with mortality (ρ=0.119, p<0.0001) while agreement with the joint policy correlates negatively (ρ=-0.316, p<0.0001), with the divergence attributed to differential responses to high comorbidity levels. The authors conclude that incorporating post-discharge mortality into the reward is critical for learning safer sedation policies.

Significance. If the reported correlations can be shown to reflect causal differences attributable to the reward design rather than unmeasured confounding, the work would provide concrete evidence that long-term survival objectives improve the safety profile of offline RL policies for ICU sedation. The use of recurrent representations to address partial observability and the scale of the MIMIC-IV cohort are strengths that could inform subsequent studies on reward specification in medical RL.

major comments (2)
  1. [Abstract and Results] The central claim that the sign flip in agreement-mortality correlations demonstrates safer policies from the joint reward rests on treating clinician-policy agreement as a valid safety proxy. However, no causal identification strategy, sensitivity analysis for unmeasured confounding, or external validation cohort is provided; the policies are learned offline from the same observational trajectories, and mortality is a terminal outcome influenced by many unrecorded factors (overall care quality, discharge decisions). This issue is load-bearing for the interpretation in the abstract and results.
  2. [Methods (policy learning and evaluation)] The behavior-regularized actor-critic objective is described as keeping the policy close to observed actions, but the manuscript does not demonstrate that this regularization breaks the confounding between agreement and mortality. The recurrent hidden state is only partially observed, and no explicit controls for selection bias or omitted-variable bias in the comorbidity-mortality relationship are reported.
minor comments (2)
  1. [Abstract] The abstract states the correlations but does not specify the exact regression model or covariates used to compute ρ; adding this detail would improve reproducibility.
  2. [Methods] Notation for the recurrent state representation and the behavior regularization coefficient should be defined consistently between the methods and results sections.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments. We address the major concerns regarding causal interpretation and confounding below, and have updated the manuscript accordingly to clarify limitations and strengthen the analysis.

read point-by-point responses
  1. Referee: [Abstract and Results] The central claim that the sign flip in agreement-mortality correlations demonstrates safer policies from the joint reward rests on treating clinician-policy agreement as a valid safety proxy. However, no causal identification strategy, sensitivity analysis for unmeasured confounding, or external validation cohort is provided; the policies are learned offline from the same observational trajectories, and mortality is a terminal outcome influenced by many unrecorded factors (overall care quality, discharge decisions). This issue is load-bearing for the interpretation in the abstract and results.

    Authors: We concur that our findings are observational and that clinician agreement with the learned policies is used as a proxy for safety rather than establishing direct causality. The observed sign reversal in correlations indicates that the joint pain-and-mortality policy tends to disagree with clinicians in scenarios linked to higher mortality, particularly among patients with elevated comorbidity burdens. To address this, we have incorporated sensitivity analyses in the revised version, including stratification by key observed confounders such as comorbidity scores and proxies for care quality. We have also revised the abstract and results sections to frame the conclusions more cautiously as suggestive evidence requiring further validation. Unfortunately, an external validation cohort is not available in the current study, but we discuss this as a limitation. revision: partial

  2. Referee: [Methods (policy learning and evaluation)] The behavior-regularized actor-critic objective is described as keeping the policy close to observed actions, but the manuscript does not demonstrate that this regularization breaks the confounding between agreement and mortality. The recurrent hidden state is only partially observed, and no explicit controls for selection bias or omitted-variable bias in the comorbidity-mortality relationship are reported.

    Authors: The behavior regularization serves to anchor the policy within the distribution of observed clinician actions, thereby mitigating extrapolation risks in the offline setting. The agreement-mortality correlation is computed post-training by comparing policy recommendations to actual actions and correlating with patient outcomes. While this does not fully eliminate confounding, the differential behavior of the two policies stems directly from their reward functions, as evidenced by their contrasting handling of high-comorbidity cases. In revision, we have added regression models that control for observed comorbidities and other covariates to address omitted-variable bias, and we have clarified the partial observability of the recurrent state while noting its role in capturing temporal dynamics. revision: yes

standing simulated objections not resolved
  • Complete causal identification and external validation due to the observational nature of the MIMIC-IV data.

Circularity Check

0 steps flagged

No significant circularity; empirical correlations are independent of fitted parameters

full rationale

The paper's central results consist of post-training empirical correlations (ρ=0.119 and ρ=-0.316) between clinician agreement with learned policies and mortality, computed on MIMIC-IV trajectories after offline RL training. These quantities are obtained via standard statistical association on observed data and model outputs rather than by algebraic reduction to the behavior-regularized actor-critic objective or any self-citation. No load-bearing step equates a claimed prediction to its own inputs by construction, and the derivation relies on external data benchmarks rather than internal self-definition.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard RL assumptions plus several domain-specific modeling choices whose justification is not supplied in the abstract.

free parameters (2)
  • behavior regularization coefficient
    Controls how strongly the learned policy is pulled toward observed clinician actions; value chosen during training.
  • recurrent hidden state dimension
    Size of LSTM or GRU state representation; selected as hyperparameter.
axioms (2)
  • domain assumption The ICU state can be adequately summarized by the recurrent representation of observed vital signs, labs, and prior doses.
    Invoked to justify the use of recurrent networks for partial observability.
  • domain assumption Clinician agreement with a policy is a valid proxy for the policy's clinical safety.
    Underlies the interpretation of the reported correlations.

pith-pipeline@v0.9.0 · 5754 in / 1507 out tokens · 41048 ms · 2026-05-21T13:30:51.203375+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 7 internal anchors

  1. [1]

    Pain in the icu: a psychiatric perspective,

    P. N. Azzam and A. Alam, “Pain in the icu: a psychiatric perspective,” Journal of intensive care medicine, vol. 28, no. 3, pp. 140–150, 2013

  2. [2]

    Pain management in the surgical icu patient,

    J. A. Harvin and L. S. Kao, “Pain management in the surgical icu patient,”Current opinion in critical care, vol. 26, no. 6, pp. 628–633, 2020

  3. [3]

    The critical care crisis of opioid overdoses in the united states,

    J. P. Stevens, M. J. Wall, L. Novack, J. Marshall, D. J. Hsu, and M. D. Howell, “The critical care crisis of opioid overdoses in the united states,” Annals of the American Thoracic Society, vol. 14, no. 12, pp. 1803–1809, 2017

  4. [4]

    Effect of analgesic treatment on the physiological consequences of acute pain,

    K. S. Lewis, J. K. Whipple, K. A. Michael, and E. J. Quebbeman, “Effect of analgesic treatment on the physiological consequences of acute pain,”American Journal of Health-System Pharmacy, vol. 51, no. 12, pp. 1539–1554, 1994

  5. [5]

    Pain and satisfaction with pain control in seriously ill hospitalized adults: findings from the support research investigations,

    N. A. Desbiens, A. W. Wu, S. K. Broste, N. S. Wenger, A. F. Connors, J. Lynn, Y . Yasui, R. S. Phillips, and W. Fulkerson, “Pain and satisfaction with pain control in seriously ill hospitalized adults: findings from the support research investigations,”Critical care medicine, vol. 24, no. 12, pp. 1953–1961, 1996

  6. [6]

    Prevalence, location, and characteristics of chronic pain in intensive care survivors,

    A. K. Langerud, T. Rustøen, C. Brunborg, U. Kongsgaard, and A. Stub- haug, “Prevalence, location, and characteristics of chronic pain in intensive care survivors,”Pain Management Nursing, vol. 19, no. 4, pp. 366–376, 2018

  7. [7]

    Does the impact of the type of anesthesia on outcomes differ by patient age and comorbidity burden?

    S. G. Memtsoudis, R. Rasul, S. Suzuki, J. Poeran, T. Danninger, C. Wu, M. Mazumdar, and V . V ougioukas, “Does the impact of the type of anesthesia on outcomes differ by patient age and comorbidity burden?” Regional Anesthesia & Pain Medicine, vol. 39, no. 2, pp. 112–119, 2014

  8. [8]

    Unrecognised, undertreated, pain in icu—causes, effects, and how to do better,

    S. Alderson and S. Mckechnie, “Unrecognised, undertreated, pain in icu—causes, effects, and how to do better,”Open Journal of Nursing, vol. 3, no. 1, pp. 108–113, 2013

  9. [9]

    Relieving pain in america: a blueprint for transforming prevention, care, education, and research,

    L. S. Simon, “Relieving pain in america: a blueprint for transforming prevention, care, education, and research,”Journal of pain & palliative care pharmacotherapy, vol. 26, no. 2, pp. 197–198, 2012

  10. [10]

    The economic costs of pain in the united states,

    D. J. Gaskin and P. Richard, “The economic costs of pain in the united states,”The journal of pain, vol. 13, no. 8, pp. 715–724, 2012

  11. [11]

    Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020

  12. [12]

    Reinforcement learning for clinical decision support in critical care: comprehensive review,

    S. Liu, K. C. See, K. Y . Ngiam, L. A. Celi, X. Sun, and M. Feng, “Reinforcement learning for clinical decision support in critical care: comprehensive review,”Journal of medical Internet research, vol. 22, no. 7, p. e18477, 2020

  13. [13]

    Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learn- ing approach,

    S. Nemati, M. M. Ghassemi, and G. D. Clifford, “Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learn- ing approach,” in2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, 2016, pp. 2978–2981

  14. [14]

    Deep Reinforcement Learning for Sepsis Treatment

    A. Raghu, M. Komorowski, I. Ahmed, L. Celi, P. Szolovits, and M. Ghassemi, “Deep reinforcement learning for sepsis treatment,”arXiv preprint arXiv:1711.09602, 2017

  15. [15]

    Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients

    W.-H. Weng, M. Gao, Z. He, S. Yan, and P. Szolovits, “Representation and reinforcement learning for personalized glycemic control in septic patients,”arXiv preprint arXiv:1712.00654, 2017

  16. [16]

    Electronic health records based reinforcement learning for treatment optimizing,

    T. Li, Z. Wang, W. Lu, Q. Zhang, and D. Li, “Electronic health records based reinforcement learning for treatment optimizing,”Information Systems, vol. 104, p. 101878, 2022

  17. [17]

    Reinforcement learning assisted oxygen therapy for covid-19 patients under intensive care,

    H. Zheng, J. Zhu, W. Xie, and J. Zhong, “Reinforcement learning assisted oxygen therapy for covid-19 patients under intensive care,”BMC medical informatics and decision making, vol. 21, no. 1, p. 350, 2021

  18. [18]

    A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units

    N. Prasad, L.-F. Cheng, C. Chivers, M. Draugelis, and B. E. Engelhardt, “A reinforcement learning approach to weaning of mechanical ventila- tion in intensive care units,”arXiv preprint arXiv:1704.06300, 2017

  19. [19]

    Inverse reinforcement learning for intelli- gent mechanical ventilation and sedative dosing in intensive care units,

    C. Yu, J. Liu, and H. Zhao, “Inverse reinforcement learning for intelli- gent mechanical ventilation and sedative dosing in intensive care units,” BMC medical informatics and decision making, vol. 19, no. 2, pp. 111– 120, 2019

  20. [20]

    Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units,

    C. Yu, G. Ren, and Y . Dong, “Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units,”BMC medical informatics and decision making, vol. 20, no. 3, pp. 1–8, 2020

  21. [21]

    Deep reinforcement learning for optimal critical care pain management with morphine using dueling double-deep q networks,

    D. Lopez-Martinez, P. Eschenfeldt, S. Ostvar, M. Ingram, C. Hur, and R. Picard, “Deep reinforcement learning for optimal critical care pain management with morphine using dueling double-deep q networks,” in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2019, pp. 3960–3963

  22. [22]

    Patient-specific sedation management via deep reinforcement learning,

    N. Eghbali, T. Alhanai, and M. M. Ghassemi, “Patient-specific sedation management via deep reinforcement learning,”Frontiers in Digital Health, vol. 3, p. 608893, 2021

  23. [23]

    Reinforcement learning approach to sedation and delirium man- agement in the intensive care unit,

    ——, “Reinforcement learning approach to sedation and delirium man- agement in the intensive care unit,” in2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), 2023, pp. 1–5

  24. [24]

    R. S. Sutton and A. G. Barto,Reinforcement learning: An introduction. MIT press, 2018

  25. [25]

    Deterministic policy gradient algorithms,

    D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” inInternational conference on machine learning. Pmlr, 2014, pp. 387–395

  26. [26]

    Continuous control with deep reinforcement learning

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,”arXiv preprint arXiv:1509.02971, 2015

  27. [27]

    Off-policy deep reinforcement learning without exploration,

    S. Fujimoto, D. Meger, and D. Precup, “Off-policy deep reinforcement learning without exploration,” inInternational conference on machine learning. PMLR, 2019, pp. 2052–2062

  28. [28]

    Issues in using function approximation for reinforcement learning,

    S. Thrun and A. Schwartz, “Issues in using function approximation for reinforcement learning,” inProceedings of the 1993 connectionist models summer school. Psychology Press, 2014, pp. 255–263

  29. [29]

    Deep reinforcement learning with double q-learning,

    H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016

  30. [30]

    Stabilizing off- policy q-learning via bootstrapping error reduction,

    A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine, “Stabilizing off- policy q-learning via bootstrapping error reduction,”Advances in neural information processing systems, vol. 32, 2019

  31. [31]

    Conservative q-learning for offline reinforcement learning,

    A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,”Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020

  32. [32]

    Planning and acting in partially observable stochastic domains,

    L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,”Artificial intelligence, vol. 101, no. 1-2, pp. 99–134, 1998

  33. [33]

    Mimic- iv, a freely accessible electronic health record dataset,

    A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J. Pollard, S. Hao, B. Moody, B. Gowet al., “Mimic- iv, a freely accessible electronic health record dataset,”Scientific data, vol. 10, no. 1, p. 1, 2023

  34. [34]

    A synthesis of oral morphine equivalents (ome) for opioid utilisation studies,

    S. Nielsen, L. Degenhardt, B. Hoban, and N. Gisev, “A synthesis of oral morphine equivalents (ome) for opioid utilisation studies,” Pharmacoepidemiology and drug safety, vol. 25, no. 6, pp. 733–737, 2016

  35. [35]

    Propofol: a review of its use in intensive care sedation of adults,

    K. McKeage and C. M. Perry, “Propofol: a review of its use in intensive care sedation of adults,”CNS drugs, vol. 17, no. 4, pp. 235–272, 2003

  36. [36]

    Acute withdrawal syndrome related to the administration of analgesic and sedative medications in adult intensive care unit patients,

    W. B. Cammarano, J.-F. Pittet, S. Weitz, R. M. Schlobohm, and J. D. Marks, “Acute withdrawal syndrome related to the administration of analgesic and sedative medications in adult intensive care unit patients,” Critical care medicine, vol. 26, no. 4, pp. 676–684, 1998

  37. [37]

    A double-blind, randomized comparison of iv lorazepam versus midazolam for sedation of icu patients via a pharmacologic model,

    J. Barr, K. Zomorodi, E. J. Bertaccini, S. L. Shafer, and E. Geller, “A double-blind, randomized comparison of iv lorazepam versus midazolam for sedation of icu patients via a pharmacologic model,”The Journal of the American Society of Anesthesiologists, vol. 95, no. 2, pp. 286–298, 2001

  38. [38]

    Dexmedetomidine,

    N. Bhana, K. L. Goa, and K. J. McClellan, “Dexmedetomidine,”Drugs, vol. 59, no. 2, pp. 263–268, 2000

  39. [39]

    An extensive data processing pipeline for mimic-iv,

    M. Gupta, B. Gallamoza, N. Cutrona, P. Dhakal, R. Poulain, and R. Beheshti, “An extensive data processing pipeline for mimic-iv,” in Machine learning for health. PMLR, 2022, pp. 311–325

  40. [40]

    Detecting hazardous intensive care patient episodes using real- time mortality models,

    C. Hug, “Detecting hazardous intensive care patient episodes using real- time mortality models,” Ph.D. dissertation, Massachusetts Institute of Technology, 2009

  41. [41]

    Multiple imputation using chained equations: issues and guidance for practice,

    I. R. White, P. Royston, and A. M. Wood, “Multiple imputation using chained equations: issues and guidance for practice,”Statistics in medicine, vol. 30, no. 4, pp. 377–399, 2011

  42. [42]

    Gradient boosting machines, a tutorial,

    A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in neurorobotics, vol. 7, p. 21, 2013

  43. [43]

    What are decision trees?

    C. Kingsford and S. L. Salzberg, “What are decision trees?”Nature biotechnology, vol. 26, no. 9, pp. 1011–1013, 2008

  44. [44]

    Xgboost: A scalable tree boosting system,

    T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794

  45. [45]

    Predicting mortality of patients with acute kidney injury in the icu using xgboost model,

    J. Liu, J. Wu, S. Liu, M. Li, K. Hu, and K. Li, “Predicting mortality of patients with acute kidney injury in the icu using xgboost model,”Plos one, vol. 16, no. 2, p. e0246306, 2021

  46. [46]

    On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

    K. Cho, B. Van Merriënboer, D. Bahdanau, and Y . Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” arXiv preprint arXiv:1409.1259, 2014

  47. [47]

    Memory-based control with recurrent neural networks

    N. Heess, J. J. Hunt, T. P. Lillicrap, and D. Silver, “Memory-based con- trol with recurrent neural networks,”arXiv preprint arXiv:1512.04455, 2015

  48. [48]

    Addressing function approxi- mation error in actor-critic methods,

    S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational conference on machine learning. PMLR, 2018, pp. 1587–1596

  49. [49]

    A minimalist approach to offline reinforce- ment learning,

    S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforce- ment learning,”Advances in neural information processing systems, vol. 34, pp. 20 132–20 145, 2021

  50. [50]

    Sedation in the in- tensive care unit with remifentanil/propofol versus midazolam/fentanyl: a randomised, open-label, pharmacoeconomic trial,

    B. Muellejans, T. Matthey, J. Scholpp, and M. Schill, “Sedation in the in- tensive care unit with remifentanil/propofol versus midazolam/fentanyl: a randomised, open-label, pharmacoeconomic trial,”Critical Care, vol. 10, no. 3, p. R91, 2006

  51. [51]

    F. W. Rozendaal, P. E. Spronk, F. F. Snellen, A. Schoen, A. R. van Zanten, N. A. Foudraine, P. G. Mulder, J. Bakker, and other UltiSAFE in- vestigators, “Remifentanil-propofol analgo-sedation shortens duration of ventilation and length of icu stay compared to a conventional regimen: a centre randomised, cross-over, open-label study in the netherlands,” In...

  52. [52]

    Impact of an analgesia-based sedation protocol on mechanically ventilated patients in a medical intensive care unit,

    A. C. Faust, P. Rajan, L. A. Sheperd, C. A. Alvarez, P. McCorstin, and R. L. Doebele, “Impact of an analgesia-based sedation protocol on mechanically ventilated patients in a medical intensive care unit,” Anesthesia & Analgesia, vol. 123, no. 4, pp. 903–909, 2016