AGMCTS augments MCTS with action-score gradients for particle beliefs, a Multiple Importance Sampling tree for reuse, and Area Formula gradients for smooth models, outperforming prior sample-based solvers on continuous benchmarks.
Trust region policy optimization of pomdps
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
OPPO augments PPO with optimistic policy evaluation driven by return uncertainty estimates and shows improved results over prior methods on a tabular sparse-reward task.
citing papers explorer
-
Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs
AGMCTS augments MCTS with action-score gradients for particle beliefs, a Multiple Importance Sampling tree for reuse, and Area Formula gradients for smooth models, outperforming prior sample-based solvers on continuous benchmarks.
-
Optimistic Proximal Policy Optimization
OPPO augments PPO with optimistic policy evaluation driven by return uncertainty estimates and shows improved results over prior methods on a tabular sparse-reward task.