Inter-Level Cooperation in Hierarchical Reinforcement Learning
read the original abstract
Hierarchies of temporally decoupled policies present a promising approach for enabling structured exploration in complex long-term planning problems. To fully achieve this approach an end-to-end training paradigm is needed. However, training these multi-level policies has had limited success due to challenges arising from interactions between the goal-assigning and goal-achieving levels within a hierarchy. In this article, we consider the policy optimization process as a multi-agent process. This allows us to draw on connections between communication and cooperation in multi-agent RL, and demonstrate the benefits of increased cooperation between sub-policies on the training performance of the overall policy. We introduce a simple yet effective technique for inducing inter-level cooperation by modifying the objective function and subsequent gradients of higher-level policies. Experimental results on a wide variety of simulated robotics and traffic control tasks demonstrate that inducing cooperation results in stronger performing policies and increased sample efficiency on a set of difficult long time horizon tasks. We also find that goal-conditioned policies trained using our method display better transfer to new tasks, highlighting the benefits of our method in learning task-agnostic lower-level behaviors. Videos and code are available at: https://sites.google.com/berkeley.edu/cooperative-hrl.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy
Temporal Transfer Learning selects source tasks for zero-shot transfer of RL policies to solve a range of coarse-grained advisory autonomy hold durations in traffic optimization more reliably than baselines.
-
Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations
The paper formulates a two-level optimization scheme integrating control, classical planning, and reinforcement learning to improve safety and interpretability in autonomous systems.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.