POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

Finale Doshi-Velez; Joseph Futoma; Michael C. Hughes

arxiv: 2001.04032 · v2 · pith:V7FQN5EBnew · submitted 2020-01-13 · 📊 stat.ML · cs.LG

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

Joseph Futoma , Michael C. Hughes , Finale Doshi-Velez This is my paper

classification 📊 stat.ML cs.LG

keywords datadecision-makingmedicalobservedpartiallyplanningwhenapproach

0 comments

read the original abstract

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An adaptive variance estimator for relative sparsity
stat.ME 2026-05 unverdicted novelty 6.0

A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.
VentAgent: When LLMs Learn to Breathe -- Multi-Objective Arbitration for ARDS Ventilation
cs.LG 2026-06 unverdicted novelty 5.0

VentAgent uses LLMs in a three-stage Perception-Planning-Orchestration hierarchy to perform multi-objective arbitration for mechanical ventilation in ARDS, outperforming RL baselines on a simulator while producing hum...
Treatment, evidence, imitation, and chat
stat.OT 2025-06 unverdicted novelty 4.0

LLMs cannot solve the medical treatment problem through imitation alone because it requires evidence from experiments or observations, posing ethical challenges for training such systems.