Latent Space Policies for Hierarchical Reinforcement Learning
read the original abstract
We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
-
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.
-
Neural Embedding for Physical Manipulations
Generative model with normalized pairwise distance constraint discovers output space topologies from sparse data and outperforms GANs and VAEs by avoiding mode collapse.
-
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Maximum entropy reinforcement learning is equivalent to exact probabilistic inference for deterministic dynamics and variational inference for stochastic dynamics.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.