Benchmarking Action Spaces in Reinforcement Learning for Vision-based Robotic Manipulation

Abhishek Naik; A. Rupam Mahmood; Colin Bellinger; Homayoon Farrahi; Seyed Alireza Azimi

arxiv: 2606.18594 · v1 · pith:5PTBAZZGnew · submitted 2026-06-17 · 💻 cs.RO · cs.AI

Benchmarking Action Spaces in Reinforcement Learning for Vision-based Robotic Manipulation

Seyed Alireza Azimi , Homayoon Farrahi , Abhishek Naik , Colin Bellinger , A. Rupam Mahmood This is my paper

Pith reviewed 2026-06-26 21:21 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords reinforcement learningaction spacesrobotic manipulationsim-to-real transfervision-based controlpickingpushingmotion smoothness

0 comments

The pith

Joint velocity action spaces outperform pose increments, pose velocity, and joint position increments for vision-based robotic picking and pushing when policies transfer from simulation to real robots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests four action space options in reinforcement learning policies for two standard manipulation tasks. All policies receive camera images as input and are first trained in simulation before direct deployment on physical hardware. Performance is measured by both final task success and the smoothness of the resulting motions. Joint velocity produces the highest success rates alongside the smoothest trajectories. The results supply direct guidance on which representation to select when moving from simulated training to real-world execution.

Core claim

Among the four representations tested, joint velocity yields the best combination of task completion rates and motion smoothness for both object picking and object pushing once policies move from simulation to the physical robot.

What carries the argument

Benchmark comparison of four action space formulations (pose increment, pose velocity, joint position increment, joint velocity) on vision-based picking and pushing with sim-to-real transfer.

If this is right

Joint velocity produces smoother real-robot trajectories than the other three spaces.
Policies using joint velocity reach higher final success rates after sim-to-real transfer.
Action space choice measurably changes the reliability of sim-to-real transfer for vision-based tasks.
Practical selection of joint velocity can reduce the need for extra safety filtering or post-processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If joint velocity remains superior on additional tasks with similar dynamics, it could become a default choice for many vision-based manipulation problems.
The ranking might shift on robots whose kinematic properties differ substantially from the one used here.
Extending the same benchmark to tasks requiring precise force control could expose limits of velocity-based representations.

Load-bearing premise

The two chosen tasks together with the specific sim-to-real protocol are representative of vision-based manipulation problems in general.

What would settle it

Repeating the identical training and transfer protocol on a third task such as stacking blocks or on a robot with different joint configuration and checking whether joint velocity still ranks first in success and smoothness.

Figures

Figures reproduced from arXiv: 2606.18594 by Abhishek Naik, A. Rupam Mahmood, Colin Bellinger, Homayoon Farrahi, Seyed Alireza Azimi.

**Figure 1.** Figure 1: MuJoCo state transition. action is the action produced by the policy. q ′ t and v ′ t are the target joint positions and velocities, respectively. qt and vt are the current joint positions and velocities. τt is the applied torque and at is the current acceleration. qt+1, vt+1 are the updated joint positions and velocities. Actuator. In our work, we consider two actuation models that are described as follow… view at source ↗

**Figure 3.** Figure 3: PandaPickCuboid simulation training results consisting of episodic return, success rate, and episodic length. 10 independent runs are shown with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Real-world setup and wrist-mounted camera. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Simulation training metrics: floor collisions and jerk per step for picking and pushing tasks. 10 independent runs are displayed, and the median [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

In real-world reinforcement learning (RL), the choice of action space can play a key role in shaping motion smoothness, safety, and overall task performance. In this study, we evaluate pose increment, pose velocity, joint position increment, and joint velocity across two vision-based manipulation tasks: object picking and pushing. We train policies in simulation and deploy them to the real world using sim-to-real transfer. We find that action-space representation indeed significantly affects sim-to-real performance. In particular, we find that the joint velocity action space is best for the vision-based picking and pushing tasks in terms of smoothness and final task performance. We also provide practical guidance for RL practitioners in choosing action spaces for both simulation and real-world experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Joint velocity comes out ahead on their two tasks for smoothness and success, but the lack of numbers and stats in the abstract leaves the size of the effect unclear.

read the letter

Joint velocity action space performed best for their vision-based picking and pushing tasks under sim-to-real transfer. The paper runs a head-to-head test of four representations—pose increment, pose velocity, joint position increment, and joint velocity—on two manipulation problems and reports that the joint velocity choice gave smoother motion and higher final performance.

The useful part is the direct comparison itself. Most RL work in robotics picks an action space without showing why, so seeing all four evaluated on the same vision-based setup with real-robot transfer adds concrete data. The closing section also gives practical pointers for sim and real experiments, which matches what practitioners often need.

The main limitation is that the abstract states the ranking without success rates, smoothness scores, trial counts, or statistical tests. That makes it hard to judge how reliable or large the differences are. The study also covers only two tasks on one transfer protocol, so the ranking may not carry over to other manipulation problems or hardware.

This is aimed at robotic RL users who have to choose an action space for sim-to-real work. It is not a theoretical advance, but the empirical focus is solid enough that a referee could check the methods and results in detail. I would send it for peer review.

Referee Report

1 major / 0 minor

Summary. The manuscript empirically benchmarks four action spaces (pose increment, pose velocity, joint position increment, joint velocity) for vision-based RL policies on two manipulation tasks (object picking and pushing). Policies are trained in simulation and transferred to the real robot; the central claim is that joint velocity yields the best combination of motion smoothness and task performance, with additional practical guidance offered to practitioners.

Significance. If the reported ranking is supported by detailed quantitative results, this work would supply actionable, task-specific evidence on action-space choice for sim-to-real robotic RL, a topic of direct relevance to motion quality and deployment safety.

major comments (1)

[Abstract] Abstract: the claim that joint velocity is best for smoothness and final task performance is stated without any quantitative metrics, statistical tests, number of trials, or description of how smoothness and success were measured. This prevents verification that the data support the ranking.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment regarding the abstract. We address it directly below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that joint velocity is best for smoothness and final task performance is stated without any quantitative metrics, statistical tests, number of trials, or description of how smoothness and success were measured. This prevents verification that the data support the ranking.

Authors: We agree that the abstract in its current form is too concise and does not include supporting quantitative details. The main body of the manuscript (Sections 4 and 5) already reports success rates, smoothness metrics (e.g., mean squared jerk), number of trials (typically 10 real-world rollouts per action space), and the exact measurement protocols for both simulation and real-robot evaluation. In the revised manuscript we will expand the abstract to include the key quantitative results and a brief statement of how smoothness and success were quantified, while preserving the word limit. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a purely empirical benchmarking study that trains RL policies in simulation for four action spaces, evaluates them on two tasks, and reports sim-to-real transfer results. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described content. The central claim is an observed ranking from direct experiments, with no load-bearing step that reduces to its own inputs by construction. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical benchmarking study; no mathematical derivations or new entities introduced. Relies on standard RL training assumptions and sim-to-real transfer validity.

axioms (1)

domain assumption Standard assumptions in RL training and sim-to-real transfer hold across the tested action spaces
The study reports performance differences after transfer without detailing how transfer success was ensured or controlled for each action space.

pith-pipeline@v0.9.1-grok · 5666 in / 1126 out tokens · 20366 ms · 2026-06-26T21:21:10.079335+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 7 canonical work pages · 3 internal anchors

[1]

Setting up a reinforcement learning task with a real-world robot,

A. R. Mahmood, D. Korenkevych, B. J. Komer, and J. Bergstra, “Setting up a reinforcement learning task with a real-world robot,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018

2018
[2]

Benchmarking reinforcement learning algorithms on real-world robots,

A. R. Mahmood, D. Korenkevych, G. Vasan, W. Ma, and J. Bergstra, “Benchmarking reinforcement learning algorithms on real-world robots,” inConference on Robot Learning (CoRL), 2018, pp. 561– 591

2018
[3]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inIEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1–8

2018
[4]

On the role of the action space in robot manipulation learning and sim-to-real transfer,

E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5895–5902, 2024

2024
[5]

Torque-based deep reinforcement learning for task- and robot-agnostic learning on bipedal robots using sim-to-real transfer,

D. Kim, G. Berseth, M. Schwartz, and J. Park, “Torque-based deep reinforcement learning for task- and robot-agnostic learning on bipedal robots using sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6251–6258, 2023

2023
[6]

Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,

R. Mart ´ın-Mart´ın, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 1010–1017

2019
[7]

A comparison of action spaces for learning manipulation tasks,

P. Varin, L. Grossman, and S. Kuindersma, “A comparison of action spaces for learning manipulation tasks,”arXiv preprint arXiv:1908.08659, 2019

work page arXiv 1908
[8]

Mujoco playground

K. Zakka, B. Tabanpour, Q. Liao, M. Haiderbhai, S. Holt, J. Y . Luo, A. Allshire, E. Frey, K. Sreenath, L. A. Kahrs, C. Sferrazza, Y . Tassa, and P. Abbeel, “MuJoCo playground,”arXiv preprint arXiv:2502.08844, 2025

work page arXiv 2025
[9]

Open-source reinforcement learning environments implemented in MuJoCo with franka manipulator,

Z. Xu, Y . Li, X. Yang, Z. Zhao, L. Zhuang, and J. Zhao, “Open-source reinforcement learning environments implemented in MuJoCo with franka manipulator,” inIEEE International Conference on Advanced Intelligent Mechatronics (AIM), 2024, pp. 709–714

2024
[10]

Learn- ing hand-eye coordination for robotic grasping with deep learning and large-scale data collection,

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learn- ing hand-eye coordination for robotic grasping with deep learning and large-scale data collection,”International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421–436, 2018

2018
[11]

Making sense of vision and touch: Self- supervised learning of multimodal representations for contact-rich tasks,

M. A. Lee, Y . Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, “Making sense of vision and touch: Self- supervised learning of multimodal representations for contact-rich tasks,” inIEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 8943–8950

2019
[12]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30

2017
[13]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Brax: A differentiable physics engine for large-scale rigid-body simulation,

C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, “Brax: A differentiable physics engine for large-scale rigid-body simulation,” 2021. [Online]. Available: https: //github.com/google/brax

2021
[15]

Sim-to-real: Learning agile locomotion for quadruped robots,

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bo- hez, and V . Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” inRobotics: Science and Systems (RSS), 2018

2018
[16]

Closing the sim-to-real loop: Adapting simulation randomization with real world experience,

Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. D. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” inIEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 8973– 8979

2019
[17]

Solving Rubik's Cube with a Robot Hand

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[18]

Learning locomotion skills using deeprl: Does the choice of action space matter?

X. B. Peng and M. van de Panne, “Learning locomotion skills using deeprl: Does the choice of action space matter?” inACM SIGGRAPH / Eurographics Symposium on Computer Animation (SCA), 2017, pp. 12:1–12:13

2017
[19]

Learning torque control for quadrupedal locomotion,

S. Chen, B. Zhang, M. W. Mueller, A. Rai, and K. Sreenath, “Learning torque control for quadrupedal locomotion,” inIEEE-RAS Interna- tional Conference on Humanoid Robots (Humanoids), 2023, pp. 1–8

2023
[20]

Investigating the impact of action representations in policy gradient algorithms,

J. Schneider, P. Schumacher, D. F. B. H ¨aufle, B. Sch ¨olkopf, and D. B ¨uchler, “Investigating the impact of action representations in policy gradient algorithms,”arXiv preprint arXiv:2309.06921, 2023

work page arXiv 2023
[21]

dm control: Software and tasks for continuous control,

S. Tunyasuvunakool, A. Muldal, Y . Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y . Tassa, “dm control: Software and tasks for continuous control,”Software Impacts, vol. 6, p. 100022, 2020

2020
[22]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” inIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033

2012
[23]

Franka ros interface: A ros/python api for controlling and managing the franka emika panda robot (real and simulated)

S. Sidhik, “Franka ros interface: A ros/python api for controlling and managing the franka emika panda robot (real and simulated).” 2020

2020
[24]

DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training,

A. Petrenko, A. Allshire, G. State, A. Handa, and V . Makoviychuk, “DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training,” inIEEE International Conference on Robotics and Automation (ICRA), 2023

2023
[25]

Analytical inverse kinematics for franka emika panda: A geometrical solver for 7-dof manipulators with unconven- tional design,

Y . He and S. Liu, “Analytical inverse kinematics for franka emika panda: A geometrical solver for 7-dof manipulators with unconven- tional design,” inInternational Conference on Control, Mechatronics and Automation (ICCMA), 2021, pp. 194–199

2021
[26]

Understanding domain randomization for sim-to-real transfer,

X. Chen, J. Hu, C. Jin, L. Li, and L. Wang, “Understanding domain randomization for sim-to-real transfer,” inTenth International Confer- ence on Learning Representations (ICLR), 2022

2022
[27]

Performance Variation in Deep Reinforcement Learning

H. Tanaka and A. R. Mahmood, “Performance variation in deep reinforcement learning,”arXiv preprint arXiv:2606.06746, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

General and efficient visual goal-conditioned reinforcement learning using object-agnostic masks,

F. Shahriar, C. Wang, A. Azimi, G. Vasan, H. H. Elanwar, A. R. Mah- mood, and C. Bellinger, “General and efficient visual goal-conditioned reinforcement learning using object-agnostic masks,”arXiv preprint arXiv:2510.06277, 2025

work page arXiv 2025

[1] [1]

Setting up a reinforcement learning task with a real-world robot,

A. R. Mahmood, D. Korenkevych, B. J. Komer, and J. Bergstra, “Setting up a reinforcement learning task with a real-world robot,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018

2018

[2] [2]

Benchmarking reinforcement learning algorithms on real-world robots,

A. R. Mahmood, D. Korenkevych, G. Vasan, W. Ma, and J. Bergstra, “Benchmarking reinforcement learning algorithms on real-world robots,” inConference on Robot Learning (CoRL), 2018, pp. 561– 591

2018

[3] [3]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inIEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1–8

2018

[4] [4]

On the role of the action space in robot manipulation learning and sim-to-real transfer,

E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5895–5902, 2024

2024

[5] [5]

Torque-based deep reinforcement learning for task- and robot-agnostic learning on bipedal robots using sim-to-real transfer,

D. Kim, G. Berseth, M. Schwartz, and J. Park, “Torque-based deep reinforcement learning for task- and robot-agnostic learning on bipedal robots using sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6251–6258, 2023

2023

[6] [6]

Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,

R. Mart ´ın-Mart´ın, M. A. Lee, R. Gardner, S. Savarese, J. Bohg, and A. Garg, “Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 1010–1017

2019

[7] [7]

A comparison of action spaces for learning manipulation tasks,

P. Varin, L. Grossman, and S. Kuindersma, “A comparison of action spaces for learning manipulation tasks,”arXiv preprint arXiv:1908.08659, 2019

work page arXiv 1908

[8] [8]

Mujoco playground

K. Zakka, B. Tabanpour, Q. Liao, M. Haiderbhai, S. Holt, J. Y . Luo, A. Allshire, E. Frey, K. Sreenath, L. A. Kahrs, C. Sferrazza, Y . Tassa, and P. Abbeel, “MuJoCo playground,”arXiv preprint arXiv:2502.08844, 2025

work page arXiv 2025

[9] [9]

Open-source reinforcement learning environments implemented in MuJoCo with franka manipulator,

Z. Xu, Y . Li, X. Yang, Z. Zhao, L. Zhuang, and J. Zhao, “Open-source reinforcement learning environments implemented in MuJoCo with franka manipulator,” inIEEE International Conference on Advanced Intelligent Mechatronics (AIM), 2024, pp. 709–714

2024

[10] [10]

Learn- ing hand-eye coordination for robotic grasping with deep learning and large-scale data collection,

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learn- ing hand-eye coordination for robotic grasping with deep learning and large-scale data collection,”International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421–436, 2018

2018

[11] [11]

Making sense of vision and touch: Self- supervised learning of multimodal representations for contact-rich tasks,

M. A. Lee, Y . Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, “Making sense of vision and touch: Self- supervised learning of multimodal representations for contact-rich tasks,” inIEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 8943–8950

2019

[12] [12]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30

2017

[13] [13]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Brax: A differentiable physics engine for large-scale rigid-body simulation,

C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, “Brax: A differentiable physics engine for large-scale rigid-body simulation,” 2021. [Online]. Available: https: //github.com/google/brax

2021

[15] [15]

Sim-to-real: Learning agile locomotion for quadruped robots,

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bo- hez, and V . Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” inRobotics: Science and Systems (RSS), 2018

2018

[16] [16]

Closing the sim-to-real loop: Adapting simulation randomization with real world experience,

Y . Chebotar, A. Handa, V . Makoviychuk, M. Macklin, J. Issac, N. D. Ratliff, and D. Fox, “Closing the sim-to-real loop: Adapting simulation randomization with real world experience,” inIEEE International Conference on Robotics and Automation (ICRA), 2019, pp. 8973– 8979

2019

[17] [17]

Solving Rubik's Cube with a Robot Hand

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[18] [18]

Learning locomotion skills using deeprl: Does the choice of action space matter?

X. B. Peng and M. van de Panne, “Learning locomotion skills using deeprl: Does the choice of action space matter?” inACM SIGGRAPH / Eurographics Symposium on Computer Animation (SCA), 2017, pp. 12:1–12:13

2017

[19] [19]

Learning torque control for quadrupedal locomotion,

S. Chen, B. Zhang, M. W. Mueller, A. Rai, and K. Sreenath, “Learning torque control for quadrupedal locomotion,” inIEEE-RAS Interna- tional Conference on Humanoid Robots (Humanoids), 2023, pp. 1–8

2023

[20] [20]

Investigating the impact of action representations in policy gradient algorithms,

J. Schneider, P. Schumacher, D. F. B. H ¨aufle, B. Sch ¨olkopf, and D. B ¨uchler, “Investigating the impact of action representations in policy gradient algorithms,”arXiv preprint arXiv:2309.06921, 2023

work page arXiv 2023

[21] [21]

dm control: Software and tasks for continuous control,

S. Tunyasuvunakool, A. Muldal, Y . Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y . Tassa, “dm control: Software and tasks for continuous control,”Software Impacts, vol. 6, p. 100022, 2020

2020

[22] [22]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” inIEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033

2012

[23] [23]

Franka ros interface: A ros/python api for controlling and managing the franka emika panda robot (real and simulated)

S. Sidhik, “Franka ros interface: A ros/python api for controlling and managing the franka emika panda robot (real and simulated).” 2020

2020

[24] [24]

DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training,

A. Petrenko, A. Allshire, G. State, A. Handa, and V . Makoviychuk, “DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training,” inIEEE International Conference on Robotics and Automation (ICRA), 2023

2023

[25] [25]

Analytical inverse kinematics for franka emika panda: A geometrical solver for 7-dof manipulators with unconven- tional design,

Y . He and S. Liu, “Analytical inverse kinematics for franka emika panda: A geometrical solver for 7-dof manipulators with unconven- tional design,” inInternational Conference on Control, Mechatronics and Automation (ICCMA), 2021, pp. 194–199

2021

[26] [26]

Understanding domain randomization for sim-to-real transfer,

X. Chen, J. Hu, C. Jin, L. Li, and L. Wang, “Understanding domain randomization for sim-to-real transfer,” inTenth International Confer- ence on Learning Representations (ICLR), 2022

2022

[27] [27]

Performance Variation in Deep Reinforcement Learning

H. Tanaka and A. R. Mahmood, “Performance variation in deep reinforcement learning,”arXiv preprint arXiv:2606.06746, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[28] [28]

General and efficient visual goal-conditioned reinforcement learning using object-agnostic masks,

F. Shahriar, C. Wang, A. Azimi, G. Vasan, H. H. Elanwar, A. R. Mah- mood, and C. Bellinger, “General and efficient visual goal-conditioned reinforcement learning using object-agnostic masks,”arXiv preprint arXiv:2510.06277, 2025

work page arXiv 2025