COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
Off-policy deep reinforcement learning without exploration,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Offline RL for ICU sedation shows that adding 30-day mortality to the objective yields policies whose clinician agreement correlates negatively with mortality, unlike pain-only versions.
citing papers explorer
-
COOPO: Cyclic Offline-Online Policy Optimization Algorithm
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
-
On Safer Reinforcement Learning for Sedation and Analgesia in Intensive Care
Offline RL for ICU sedation shows that adding 30-day mortality to the objective yields policies whose clinician agreement correlates negatively with mortality, unlike pain-only versions.