Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

Anh Nguyen; Darwin G. Caldwell; Dimitrios Kanoulas; Luca Muratore; Nikos G. Tsagarakis

arxiv: 1710.00290 · v1 · pith:6G4DONAYnew · submitted 2017-10-01 · 💻 cs.RO · cs.CV

Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

Anh Nguyen , Dimitrios Kanoulas , Luca Muratore , Darwin G. Caldwell , Nikos G. Tsagarakis This is my paper

classification 💻 cs.RO cs.CV

keywords deepmanipulationnetworksneuralcommandsdemonstratefeaturesframework

0 comments

read the original abstract

We present a new method to translate videos to commands for robotic manipulation using Deep Recurrent Neural Networks (RNN). Our framework first extracts deep features from the input video frames with a deep Convolutional Neural Networks (CNN). Two RNN layers with an encoder-decoder architecture are then used to encode the visual features and sequentially generate the output words as the command. We demonstrate that the translation accuracy can be improved by allowing a smooth transaction between two RNN layers and using the state-of-the-art feature extractor. The experimental results on our new challenging dataset show that our approach outperforms recent methods by a fair margin. Furthermore, we combine the proposed translation module with the vision and planning system to let a robot perform various manipulation tasks. Finally, we demonstrate the effectiveness of our framework on a full-size humanoid robot WALK-MAN.

This paper has not been read by Pith yet.

Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks

discussion (0)