HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
Differentiable Discrete Event Simulation for Queuing Network Control
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
QPLEX Decision Processes embed QPLEX transient approximations into a nonlinear MDP framework and optimize policies via deterministic gradients and natural-gradient methods, demonstrated on a dynamic pricing problem with waiting costs or chance constraints.
citing papers explorer
-
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
-
QPLEX Decision Processes: Formulation via Nonlinear Markov Chains and Optimization via Policy Gradients
QPLEX Decision Processes embed QPLEX transient approximations into a nonlinear MDP framework and optimize policies via deterministic gradients and natural-gradient methods, demonstrated on a dynamic pricing problem with waiting costs or chance constraints.