MDPO improves differentiable planning by injecting gradient-sensitivity-adapted noise into the action space, outperforming both deterministic variants and PPO on nonlinear and hybrid benchmarks.
pyrddlgym: From rddl to gym environments
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration
MDPO improves differentiable planning by injecting gradient-sensitivity-adapted noise into the action space, outperforming both deterministic variants and PPO on nonlinear and hybrid benchmarks.