POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning
read the original abstract
Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
An adaptive variance estimator for relative sparsity
A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.
-
VentAgent: When LLMs Learn to Breathe -- Multi-Objective Arbitration for ARDS Ventilation
VentAgent uses LLMs in a three-stage Perception-Planning-Orchestration hierarchy to perform multi-objective arbitration for mechanical ventilation in ARDS, outperforming RL baselines on a simulator while producing hum...
-
Treatment, evidence, imitation, and chat
LLMs cannot solve the medical treatment problem through imitation alone because it requires evidence from experiments or observations, posing ethical challenges for training such systems.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.