Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Honglak Lee; Ofir Nachum; Sergey Levine; Shixiang Gu

arxiv: 1810.01257 · v2 · pith:B3ABKTSNnew · submitted 2018-10-02 · 💻 cs.AI

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Ofir Nachum , Shixiang Gu , Honglak Lee , Sergey Levine This is my paper

classification 💻 cs.AI

keywords representationlearninghierarchicalbetterexpressionspolicyproblemreinforcement

0 comments

read the original abstract

We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning. In such hierarchical structures, a higher-level controller solves tasks by iteratively communicating goals which a lower-level policy is trained to reach. Accordingly, the choice of representation -- the mapping of observation space to goal space -- is crucial. To study this problem, we develop a notion of sub-optimality of a representation, defined in terms of expected reward of the optimal hierarchical policy using this representation. We derive expressions which bound the sub-optimality and show how these expressions can be translated to representation learning objectives which may be optimized in practice. Results on a number of difficult continuous-control tasks show that our approach to representation learning yields qualitatively better representations as well as quantitatively better hierarchical policies, compared to existing methods (see videos at https://sites.google.com/view/representation-hrl).

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
cs.LG 2019-07 unverdicted novelty 6.0

A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
Abstraction for Offline Goal-Conditioned Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 5.0

Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
cs.LG 2025-06 unverdicted novelty 5.0

SSE improves long-horizon goal-conditioned RL by using failure and partial-success transitions to identify unreliable subgoals, streamline high-level planning, and outperform prior hierarchical methods on benchmarks.