pith. sign in

arxiv: 1804.02808 · v2 · pith:GDURU7NJnew · submitted 2018-04-09 · 💻 cs.LG · cs.AI· stat.ML

Latent Space Policies for Hierarchical Reinforcement Learning

classification 💻 cs.LG cs.AIstat.ML
keywords layerlatentlayerslearningpolicieshigherlowerreinforcement
0
0 comments X
read the original abstract

We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

    cs.LG 2019-07 unverdicted novelty 6.0

    A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.

  2. Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

    cs.LG 2019-06 unverdicted novelty 6.0

    RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.

  3. Neural Embedding for Physical Manipulations

    cs.LG 2019-07 unverdicted novelty 4.0

    Generative model with normalized pairwise distance constraint discovers output space topologies from sparse data and outperforms GANs and VAEs by avoiding mode collapse.

  4. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

    cs.LG 2018-05 unverdicted novelty 3.0

    Maximum entropy reinforcement learning is equivalent to exact probabilistic inference for deterministic dynamics and variational inference for stochastic dynamics.