QMDP-Net: Deep Learning for Planning under Partial Observability

David Hsu; Peter Karkus; Wee Sun Lee

arxiv: 1703.06692 · v3 · pith:WM5PTMFPnew · submitted 2017-03-20 · 💻 cs.AI · cs.LG· cs.NE· stat.ML

QMDP-Net: Deep Learning for Planning under Partial Observability

Peter Karkus , David Hsu , Wee Sun Lee This is my paper

classification 💻 cs.AI cs.LGcs.NEstat.ML

keywords qmdp-netplanninglearningtasksalgorithmnetworkarchitectureend-to-end

0 comments

read the original abstract

This paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDP-net on different tasks so that it can generalize to new ones in the parameterized task set and "transfer" to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

An adaptive variance estimator for relative sparsity
stat.ME 2026-05 unverdicted novelty 6.0

A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.