Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo

Alejandro Hernandez Cordero; Iker Zamora; Nestor Gonzalez Lopez; Victor Mayoral Vilches

arxiv: 1608.05742 · v2 · pith:2H3IMCDHnew · submitted 2016-08-19 · 💻 cs.RO

Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo

Iker Zamora , Nestor Gonzalez Lopez , Victor Mayoral Vilches , Alejandro Hernandez Cordero This is my paper

classification 💻 cs.RO

keywords roboticsgazebolearningopenaipresentsreinforcementsystemtechniques

0 comments

read the original abstract

This paper presents an extension of the OpenAI Gym for robotics using the Robot Operating System (ROS) and the Gazebo simulator. The content discusses the software architecture proposed and the results obtained by using two Reinforcement Learning techniques: Q-Learning and Sarsa. Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ARMATA: Auto-Regressive Multi-Agent Task Assignment
cs.MA 2026-05 unverdicted novelty 5.0

ARMATA is a new end-to-end autoregressive model with multi-stage decoding that unifies allocation and routing for multi-agent systems and reports up to 20% better solutions than OR-Tools, CPLEX, and LKH-3 in seconds i...
RouteFormer: A Transformer-Based Routing Framework for Autonomous Vehicles
cs.RO 2025-04 unverdicted novelty 5.0

RouteFormer is a transformer-RL hybrid for single-agent graph routing that reports 10% and 7% shorter distances than Concorde and LKH-3 on mission-like graphs by incorporating constraints the solvers ignore.
Planning Robot Motion using Deep Visual Prediction
cs.RO 2019-06 unverdicted novelty 3.0

PROM-Net performs unsupervised visual prediction of robot motion from raw frames and integrates the predictions into model predictive control for navigation in unknown dynamic settings.