Bellman-Taylor score decoding framework for MDPs with implicit state-dependent action constraints, enabling standard DRL optimization with a decomposed optimality gap guarantee.
Inpatient Overflow Management with Proximal Policy Optimization
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Problem Definition: Managing inpatient flow in large hospital systems is challenging due to the complexity of assigning randomly arriving patients -- either waiting for primary units or being overflowed to alternative units. Current practices rely on ad-hoc rules, while prior analytical approaches struggle with the intractably large state and action spaces inherent in patient-unit matching. Scalable decision support is needed to optimize overflow management while accounting for time-periodic fluctuations in patient flow. Methodology/Results: We develop a scalable decision-making framework using Proximal Policy Optimization (PPO) to optimize overflow decisions in a time-periodic, long-run average cost setting. To address the combinatorial complexity, we introduce atomic actions, which decompose multi-patient routing into sequential assignments. We further enhance computational efficiency through a partially-shared policy network designed to balance parameter sharing with time-specific policy adaptations, and a queueing-informed value function approximation to improve policy evaluation. Our method significantly reduces the need for extensive simulation data, a common limitation in reinforcement learning applications. Case studies on hospital systems with up to twenty patient classes and twenty wards demonstrate that our approach matches or outperforms existing benchmarks, including approximate dynamic programming, which is computationally infeasible beyond five wards. Managerial Implications: Our framework offers a scalable, efficient, and explainable solution for managing patient flow in complex hospital systems. More broadly, our results highlight that domain-aware adaptation is more critical to improving algorithm performance than fine-tuning neural network parameters when applying general-purpose algorithms to specific applications.
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets
Bellman-Taylor score decoding framework for MDPs with implicit state-dependent action constraints, enabling standard DRL optimization with a decomposed optimality gap guarantee.