pith. sign in

arxiv: 1808.09105 · v4 · pith:HUTWIJFKnew · submitted 2018-08-28 · 💻 cs.LG · cs.RO· stat.ML

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

classification 💻 cs.LG cs.ROstat.ML
keywords model-basedlearningmethodobservationsrepresentationsapproachcomplexdata
0
0 comments X
read the original abstract

Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy. This enables a model-based RL method based on the linear-quadratic regulator (LQR) to be used for systems with image observations. We evaluate our approach on a range of robotics tasks, including manipulation with a real-world robotic arm directly from images. We find that our method produces substantially better final performance than other model-based RL methods while being significantly more efficient than model-free RL.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Solving Rubik's Cube with a Robot Hand

    cs.LG 2019-10 accept novelty 7.0

    Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.

  2. Exploring Model-based Planning with Policy Networks

    cs.LG 2019-06 unverdicted novelty 7.0

    POPLIN combines policy networks with model-predictive planning by optimizing either action sequences or policy parameters, yielding 3x better sample efficiency than PETS, TD3 and SAC on MuJoCo locomotion tasks.

  3. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    cs.LG 2020-05 unverdicted novelty 2.0

    Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.