Feasible value functions in POMDPs under memoryless policies form a semi-algebraic set defined by polynomial inequalities from the model parameters.
Trust region policy optimization of pomdps
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
verdicts
UNVERDICTED 3representative citing papers
AGMCTS augments MCTS with action-score gradients for particle beliefs, a Multiple Importance Sampling tree for reuse, and Area Formula gradients for smooth models, outperforming prior sample-based solvers on continuous benchmarks.
OPPO augments PPO with optimistic policy evaluation driven by return uncertainty estimates and shows improved results over prior methods on a tabular sparse-reward task.
citing papers explorer
No citing papers match the current filters.