pith. sign in

arxiv: 1912.02368 · v3 · pith:4CPUSMKYnew · submitted 2019-12-05 · 💻 cs.LG · cs.AI· stat.ML

Inter-Level Cooperation in Hierarchical Reinforcement Learning

classification 💻 cs.LG cs.AIstat.ML
keywords cooperationpoliciestaskstrainingapproachbenefitsdemonstrateincreased
0
0 comments X
read the original abstract

Hierarchies of temporally decoupled policies present a promising approach for enabling structured exploration in complex long-term planning problems. To fully achieve this approach an end-to-end training paradigm is needed. However, training these multi-level policies has had limited success due to challenges arising from interactions between the goal-assigning and goal-achieving levels within a hierarchy. In this article, we consider the policy optimization process as a multi-agent process. This allows us to draw on connections between communication and cooperation in multi-agent RL, and demonstrate the benefits of increased cooperation between sub-policies on the training performance of the overall policy. We introduce a simple yet effective technique for inducing inter-level cooperation by modifying the objective function and subsequent gradients of higher-level policies. Experimental results on a wide variety of simulated robotics and traffic control tasks demonstrate that inducing cooperation results in stronger performing policies and increased sample efficiency on a set of difficult long time horizon tasks. We also find that goal-conditioned policies trained using our method display better transfer to new tasks, highlighting the benefits of our method in learning task-agnostic lower-level behaviors. Videos and code are available at: https://sites.google.com/berkeley.edu/cooperative-hrl.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy

    cs.RO 2023-11 unverdicted novelty 7.0

    Temporal Transfer Learning selects source tasks for zero-shot transfer of RL policies to solve a range of coarse-grained advisory autonomy hold durations in traffic optimization more reliably than baselines.

  2. Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations

    math.OC 2025-07 unverdicted novelty 4.0

    The paper formulates a two-level optimization scheme integrating control, classical planning, and reinforcement learning to improve safety and interpretability in autonomous systems.