From Pixels to Torques: Policy Learning with Deep Dynamical Models

Marc Peter Deisenroth; Niklas Wahlstr\"om; Thomas B. Sch\"on

arxiv: 1502.02251 · v3 · pith:BKH6M6FTnew · submitted 2015-02-08 · 📊 stat.ML · cs.LG· cs.RO· cs.SY

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Niklas Wahlstr\"om , Thomas B. Sch\"on , Marc Peter Deisenroth This is my paper

classification 📊 stat.ML cs.LGcs.ROcs.SY

keywords learningclosed-loopcontroldeepmodelpixelspolicytorques

0 comments

read the original abstract

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
cs.LG 2019-11 accept novelty 8.0

MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environme...
Continuous control with deep reinforcement learning
cs.LG 2015-09 accept novelty 7.0

DDPG is a model-free actor-critic algorithm that learns continuous control policies end-to-end from states or pixels using deterministic policy gradients and deep networks, solving more than 20 physics tasks competiti...
Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight
cs.RO 2025-01 unverdicted novelty 6.0

DreamerV3 enables pixel-to-control policies for drone racing that reach 9 m/s in both simulation and real hardware-in-the-loop tests.