Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

John D · 2018 · cs.LG · arXiv 1806.02813

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

In this work, we take a representation learning perspective on hierarchical reinforcement learning, where the problem of learning lower layers in a hierarchy is transformed into the problem of learning trajectory-level generative models. We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems. Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. A key component of this method is to learn both a latent-conditioned policy and a latent-conditioned model which are consistent with each other. Given the same latent, the policy generates a trajectory which should match the trajectory predicted by the model. This model provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Zero-shot Imitation Learning by Latent Topology Mapping

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

cs.LG · 2019-07-01 · unverdicted · novelty 6.0

A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.

Shaping Belief States with Generative Environment Models for RL

cs.LG · 2019-06-21 · unverdicted · novelty 5.0

Multi-step predictive generative models form stable belief states capturing environment layout and agent pose, yielding higher data efficiency on RL tasks than model-free agents.

citing papers explorer

Showing 3 of 3 citing papers.

Zero-shot Imitation Learning by Latent Topology Mapping cs.LG · 2026-05-08 · unverdicted · none · ref 10
ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning cs.LG · 2019-07-01 · unverdicted · none · ref 20 · internal anchor
A two-stage framework learns a world graph of pivotal states task-agnostically via joint training of a latent model and curiosity-driven policy, then uses the graph to accelerate hierarchical RL on maze tasks.
Shaping Belief States with Generative Environment Models for RL cs.LG · 2019-06-21 · unverdicted · none · ref 31 · internal anchor
Multi-step predictive generative models form stable belief states capturing environment layout and agent pose, yielding higher data efficiency on RL tasks than model-free agents.

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer