SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

Laura Smith; Marvin Zhang; Matthew J. Johnson; Pieter Abbeel; Sergey Levine; Sharad Vikram

arxiv: 1808.09105 · v4 · pith:HUTWIJFKnew · submitted 2018-08-28 · 💻 cs.LG · cs.RO· stat.ML

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

Marvin Zhang , Sharad Vikram , Laura Smith , Pieter Abbeel , Matthew J. Johnson , Sergey Levine This is my paper

classification 💻 cs.LG cs.ROstat.ML

keywords model-basedlearningmethodobservationsrepresentationsapproachcomplexdata

0 comments

read the original abstract

Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, even when the underlying dynamical system has complex dynamics and image observations, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy. This enables a model-based RL method based on the linear-quadratic regulator (LQR) to be used for systems with image observations. We evaluate our approach on a range of robotics tasks, including manipulation with a real-world robotic arm directly from images. We find that our method produces substantially better final performance than other model-based RL methods while being significantly more efficient than model-free RL.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Solving Rubik's Cube with a Robot Hand
cs.LG 2019-10 accept novelty 7.0

Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.
Exploring Model-based Planning with Policy Networks
cs.LG 2019-06 unverdicted novelty 7.0

POPLIN combines policy networks with model-predictive planning by optimizing either action sequences or policy parameters, yielding 3x better sample efficiency than PETS, TD3 and SAC on MuJoCo locomotion tasks.
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
cs.LG 2020-05 unverdicted novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.