Benchmarking Action Spaces in Reinforcement Learning for Vision-based Robotic Manipulation
Pith reviewed 2026-06-26 21:21 UTC · model grok-4.3
The pith
Joint velocity action spaces outperform pose increments, pose velocity, and joint position increments for vision-based robotic picking and pushing when policies transfer from simulation to real robots.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Among the four representations tested, joint velocity yields the best combination of task completion rates and motion smoothness for both object picking and object pushing once policies move from simulation to the physical robot.
What carries the argument
Benchmark comparison of four action space formulations (pose increment, pose velocity, joint position increment, joint velocity) on vision-based picking and pushing with sim-to-real transfer.
If this is right
- Joint velocity produces smoother real-robot trajectories than the other three spaces.
- Policies using joint velocity reach higher final success rates after sim-to-real transfer.
- Action space choice measurably changes the reliability of sim-to-real transfer for vision-based tasks.
- Practical selection of joint velocity can reduce the need for extra safety filtering or post-processing.
Where Pith is reading between the lines
- If joint velocity remains superior on additional tasks with similar dynamics, it could become a default choice for many vision-based manipulation problems.
- The ranking might shift on robots whose kinematic properties differ substantially from the one used here.
- Extending the same benchmark to tasks requiring precise force control could expose limits of velocity-based representations.
Load-bearing premise
The two chosen tasks together with the specific sim-to-real protocol are representative of vision-based manipulation problems in general.
What would settle it
Repeating the identical training and transfer protocol on a third task such as stacking blocks or on a robot with different joint configuration and checking whether joint velocity still ranks first in success and smoothness.
Figures
read the original abstract
In real-world reinforcement learning (RL), the choice of action space can play a key role in shaping motion smoothness, safety, and overall task performance. In this study, we evaluate pose increment, pose velocity, joint position increment, and joint velocity across two vision-based manipulation tasks: object picking and pushing. We train policies in simulation and deploy them to the real world using sim-to-real transfer. We find that action-space representation indeed significantly affects sim-to-real performance. In particular, we find that the joint velocity action space is best for the vision-based picking and pushing tasks in terms of smoothness and final task performance. We also provide practical guidance for RL practitioners in choosing action spaces for both simulation and real-world experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript empirically benchmarks four action spaces (pose increment, pose velocity, joint position increment, joint velocity) for vision-based RL policies on two manipulation tasks (object picking and pushing). Policies are trained in simulation and transferred to the real robot; the central claim is that joint velocity yields the best combination of motion smoothness and task performance, with additional practical guidance offered to practitioners.
Significance. If the reported ranking is supported by detailed quantitative results, this work would supply actionable, task-specific evidence on action-space choice for sim-to-real robotic RL, a topic of direct relevance to motion quality and deployment safety.
major comments (1)
- [Abstract] Abstract: the claim that joint velocity is best for smoothness and final task performance is stated without any quantitative metrics, statistical tests, number of trials, or description of how smoothness and success were measured. This prevents verification that the data support the ranking.
Simulated Author's Rebuttal
We thank the referee for the constructive comment regarding the abstract. We address it directly below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that joint velocity is best for smoothness and final task performance is stated without any quantitative metrics, statistical tests, number of trials, or description of how smoothness and success were measured. This prevents verification that the data support the ranking.
Authors: We agree that the abstract in its current form is too concise and does not include supporting quantitative details. The main body of the manuscript (Sections 4 and 5) already reports success rates, smoothness metrics (e.g., mean squared jerk), number of trials (typically 10 real-world rollouts per action space), and the exact measurement protocols for both simulation and real-robot evaluation. In the revised manuscript we will expand the abstract to include the key quantitative results and a brief statement of how smoothness and success were quantified, while preserving the word limit. revision: yes
Circularity Check
No significant circularity
full rationale
This is a purely empirical benchmarking study that trains RL policies in simulation for four action spaces, evaluates them on two tasks, and reports sim-to-real transfer results. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described content. The central claim is an observed ranking from direct experiments, with no load-bearing step that reduces to its own inputs by construction. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions in RL training and sim-to-real transfer hold across the tested action spaces
Reference graph
Works this paper leans on
-
[1]
Setting up a reinforcement learning task with a real-world robot,
A. R. Mahmood, D. Korenkevych, B. J. Komer, and J. Bergstra, “Setting up a reinforcement learning task with a real-world robot,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018
2018
-
[2]
Benchmarking reinforcement learning algorithms on real-world robots,
A. R. Mahmood, D. Korenkevych, G. Vasan, W. Ma, and J. Bergstra, “Benchmarking reinforcement learning algorithms on real-world robots,” inConference on Robot Learning (CoRL), 2018, pp. 561– 591
2018
-
[3]
Sim-to- real transfer of robotic control with dynamics randomization,
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inIEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1–8
2018
-
[4]
On the role of the action space in robot manipulation learning and sim-to-real transfer,
E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5895–5902, 2024
2024
-
[5]
Torque-based deep reinforcement learning for task- and robot-agnostic learning on bipedal robots using sim-to-real transfer,
D. Kim, G. Berseth, M. Schwartz, and J. Park, “Torque-based deep reinforcement learning for task- and robot-agnostic learning on bipedal robots using sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6251–6258, 2023
2023
-
[6]
Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,
R. Mart ´ın-Mart´ın, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 1010–1017
2019
-
[7]
A comparison of action spaces for learning manipulation tasks,
P. Varin, L. Grossman, and S. Kuindersma, “A comparison of action spaces for learning manipulation tasks,”arXiv preprint arXiv:1908.08659, 2019
-
[8]
K. Zakka, B. Tabanpour, Q. Liao, M. Haiderbhai, S. Holt, J. Y . Luo, A. Allshire, E. Frey, K. Sreenath, L. A. Kahrs, C. Sferrazza, Y . Tassa, and P. Abbeel, “MuJoCo playground,”arXiv preprint arXiv:2502.08844, 2025
-
[9]
Open-source reinforcement learning environments implemented in MuJoCo with franka manipulator,
Z. Xu, Y . Li, X. Yang, Z. Zhao, L. Zhuang, and J. Zhao, “Open-source reinforcement learning environments implemented in MuJoCo with franka manipulator,” inIEEE International Conference on Advanced Intelligent Mechatronics (AIM), 2024, pp. 709–714
2024
-
[10]
Learn- ing hand-eye coordination for robotic grasping with deep learning and large-scale data collection,
S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learn- ing hand-eye coordination for robotic grasping with deep learning and large-scale data collection,”International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421–436, 2018
2018
-
[11]
Making sense of vision and touch: Self- supervised learning of multimodal representations for contact-rich tasks,
M. A. Lee, Y . Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, “Making sense of vision and touch: Self- supervised learning of multimodal representations for contact-rich tasks,” inIEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 8943–8950
2019
-
[12]
Domain randomization for transferring deep neural networks from simulation to the real world,
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30
2017
-
[13]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
Brax: A differentiable physics engine for large-scale rigid-body simulation,
C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, “Brax: A differentiable physics engine for large-scale rigid-body simulation,” 2021. [Online]. Available: https: //github.com/google/brax
2021
-
[15]
Sim-to-real: Learning agile locomotion for quadruped robots,
J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bo- hez, and V . Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” inRobotics: Science and Systems (RSS), 2018
2018
-
[16]
Closing the sim-to-real loop: Adapting simulation randomization with real world experience,
Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. D. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” inIEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 8973– 8979
2019
-
[17]
Solving Rubik's Cube with a Robot Hand
OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[18]
Learning locomotion skills using deeprl: Does the choice of action space matter?
X. B. Peng and M. van de Panne, “Learning locomotion skills using deeprl: Does the choice of action space matter?” inACM SIGGRAPH / Eurographics Symposium on Computer Animation (SCA), 2017, pp. 12:1–12:13
2017
-
[19]
Learning torque control for quadrupedal locomotion,
S. Chen, B. Zhang, M. W. Mueller, A. Rai, and K. Sreenath, “Learning torque control for quadrupedal locomotion,” inIEEE-RAS Interna- tional Conference on Humanoid Robots (Humanoids), 2023, pp. 1–8
2023
-
[20]
Investigating the impact of action representations in policy gradient algorithms,
J. Schneider, P. Schumacher, D. F. B. H ¨aufle, B. Sch ¨olkopf, and D. B ¨uchler, “Investigating the impact of action representations in policy gradient algorithms,”arXiv preprint arXiv:2309.06921, 2023
-
[21]
dm control: Software and tasks for continuous control,
S. Tunyasuvunakool, A. Muldal, Y . Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y . Tassa, “dm control: Software and tasks for continuous control,”Software Impacts, vol. 6, p. 100022, 2020
2020
-
[22]
MuJoCo: A physics engine for model-based control,
E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” inIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033
2012
-
[23]
Franka ros interface: A ros/python api for controlling and managing the franka emika panda robot (real and simulated)
S. Sidhik, “Franka ros interface: A ros/python api for controlling and managing the franka emika panda robot (real and simulated).” 2020
2020
-
[24]
DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training,
A. Petrenko, A. Allshire, G. State, A. Handa, and V . Makoviychuk, “DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training,” inIEEE International Conference on Robotics and Automation (ICRA), 2023
2023
-
[25]
Analytical inverse kinematics for franka emika panda: A geometrical solver for 7-dof manipulators with unconven- tional design,
Y . He and S. Liu, “Analytical inverse kinematics for franka emika panda: A geometrical solver for 7-dof manipulators with unconven- tional design,” inInternational Conference on Control, Mechatronics and Automation (ICCMA), 2021, pp. 194–199
2021
-
[26]
Understanding domain randomization for sim-to-real transfer,
X. Chen, J. Hu, C. Jin, L. Li, and L. Wang, “Understanding domain randomization for sim-to-real transfer,” inTenth International Confer- ence on Learning Representations (ICLR), 2022
2022
-
[27]
Performance Variation in Deep Reinforcement Learning
H. Tanaka and A. R. Mahmood, “Performance variation in deep reinforcement learning,”arXiv preprint arXiv:2606.06746, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[28]
General and efficient visual goal-conditioned reinforcement learning using object-agnostic masks,
F. Shahriar, C. Wang, A. Azimi, G. Vasan, H. H. Elanwar, A. R. Mah- mood, and C. Bellinger, “General and efficient visual goal-conditioned reinforcement learning using object-agnostic masks,”arXiv preprint arXiv:2510.06277, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.