Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manipulation Tasks

Alois Knoll; Bare Luka \v{Z}agar; Josip Josifovski; Mohammadhossein Malmir; Nicol\'as Navarro-Guerrero; Noah Klarmann

arxiv: 2206.06282 · v2 · submitted 2022-06-13 · 💻 cs.RO · cs.AI

Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manipulation Tasks

Josip Josifovski , Mohammadhossein Malmir , Noah Klarmann , Bare Luka \v{Z}agar , Nicol\'as Navarro-Guerrero , Alois Knoll This is my paper

Pith reviewed 2026-05-24 10:59 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords Sim2Real transferrandomizationreinforcement learningrobotic manipulationsimulation to real transferfine-tuningpolicy learning

0 comments

The pith

More randomization in simulation improves Sim2Real transfer but can hinder policy learning in simulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes an easy-to-reproduce benchmark for testing Sim2Real transfer methods using a robotic reach-and-balance task. It compares four randomization strategies with three randomized parameters both in simulation and on a real robot. The central finding is that more randomization helps in Sim2Real transfer, yet it can also harm the ability of the algorithm to find a good policy in simulation. Fully randomized simulations and fine-tuning translate better to the real robot than the other approaches tested. A sympathetic reader would care because this provides a systematic way to evaluate randomization approaches without highly customized robotic systems.

Core claim

Our results show that more randomization helps in Sim2Real transfer, yet it can also harm the ability of the algorithm to find a good policy in simulation. Fully randomized simulations and fine-tuning show differentiated results and translate better to the real robot than the other approaches tested.

What carries the argument

The easy-to-reproduce experimental setup for the robotic reach-and-balance manipulator task, which serves as a benchmark for comparing four randomization strategies with three parameters.

If this is right

More randomization helps in Sim2Real transfer.
Randomization can harm the ability to find a good policy in simulation.
Fully randomized simulations and fine-tuning translate better to the real robot.
The benchmark allows systematic comparison of randomization approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adaptive randomization levels during training could balance the trade-off between sim learning and real transfer.
The findings may apply to other RL-based robotic tasks beyond manipulation.
Fine-tuning on real or partially randomized data might be essential for complex environments.
Testing with varied robot hardware could reveal if the results are task-specific.

Load-bearing premise

The defined reach-and-balance manipulator task and the four chosen randomization strategies with three parameters form a representative and generalizable benchmark for evaluating Sim2Real transfer methods.

What would settle it

A replication of the experiment on a different task, such as a more complex grasping or assembly task, that fails to show the same differentiated results for fully randomized simulations.

Figures

Figures reproduced from arXiv: 2206.06282 by Alois Knoll, Bare Luka \v{Z}agar, Josip Josifovski, Mohammadhossein Malmir, Nicol\'as Navarro-Guerrero, Noah Klarmann.

**Figure 1.** Figure 1: Methodology a) Different training strategies. Each shape represents a different randomization parameter and whether it is enabled or not during [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Effects of different randomization parameters on the observation. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: shows the learning evolution of the different strategies measured as a cumulative reward. The results consist of an average of 5 different random initializations. Except for the fully randomized strategy, all strategies have a small standard deviation and converge to a comparable value, approximately 4% off the IK solver performance w.r.t. a random agent. The fully randomized strategy reaches a performance… view at source ↗

**Figure 5.** Figure 5: shows that a lower joint velocity makes the problem more challenging because exploring the whole search space takes longer. However, the same trends as for the 1 rad/sec training condition were observed. The fine-tuning strategy converges to a comparable value to the ideal simulation case and is approximately 5% off the baseline w.r.t. the random agent performance. The fully randomized strategy converges… view at source ↗

read the original abstract

Randomization is currently a widely used approach in Sim2Real transfer for data-driven learning algorithms in robotics. Still, most Sim2Real studies report results for a specific randomization technique and often on a highly customized robotic system, making it difficult to evaluate different randomization approaches systematically. To address this problem, we define an easy-to-reproduce experimental setup for a robotic reach-and-balance manipulator task, which can serve as a benchmark for comparison. We compare four randomization strategies with three randomized parameters both in simulation and on a real robot. Our results show that more randomization helps in Sim2Real transfer, yet it can also harm the ability of the algorithm to find a good policy in simulation. Fully randomized simulations and fine-tuning show differentiated results and translate better to the real robot than the other approaches tested.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper sets up a simple reproducible benchmark for comparing randomization strategies in Sim2Real RL and runs a head-to-head on one reach-and-balance task, but the single-task scope keeps the results from traveling far.

read the letter

The useful part is the benchmark itself. They define a reach-and-balance manipulator task that is meant to be easy to reproduce, then compare four randomization strategies across three parameters, training in simulation and testing on the real robot. That controlled setup is more systematic than the usual custom-arm papers, and the real-robot measurements give the comparison some grounding. The reported pattern—that heavier randomization aids transfer but can make it harder to learn a good policy in sim, with full randomization plus fine-tuning performing best—is a clear empirical observation from their runs.

Referee Report

2 major / 2 minor

Summary. The paper defines an easy-to-reproduce reach-and-balance manipulator task as a benchmark for evaluating Sim2Real transfer in reinforcement learning, compares four randomization strategies over three parameters in both simulation and on a real robot, and reports that greater randomization improves transfer performance while potentially harming policy quality in simulation, with fully randomized simulations plus fine-tuning outperforming the other tested approaches.

Significance. If the empirical trends hold under broader conditions, the work supplies a concrete, reproducible benchmark that addresses the field's reliance on highly customized robotic setups, enabling systematic comparisons of randomization techniques. The explicit trade-off finding between sim policy learning and real-world transfer is a useful observation for practitioners.

major comments (2)

[Abstract / Experimental Setup] Abstract and Experimental Setup: the central claim that the defined reach-and-balance task with four randomization strategies forms a representative benchmark for Sim2Real randomization effects rests on a single task whose dynamics (simple reach-and-balance) may not capture variance in contact-rich or higher-DoF manipulation; no justification or sensitivity analysis is supplied to support generalizability.
[Results] Results section: the reported performance differences between randomization strategies are presented without error bars, statistical significance tests, or raw data, so the strength of the claim that fully randomized + fine-tuning 'translates better' cannot be assessed from the available evidence.

minor comments (2)

[Abstract] The abstract states empirical results but supplies no methods details; the full manuscript should ensure the methods section explicitly lists the RL algorithm, network architecture, training hyperparameters, and real-robot measurement protocol.
[Experimental Setup] Notation for the four randomization strategies and the three parameters should be introduced with a clear table or diagram early in the experimental section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, proposing revisions where they strengthen the manuscript while defending the paper's core positioning as an accessible benchmark.

read point-by-point responses

Referee: [Abstract / Experimental Setup] Abstract and Experimental Setup: the central claim that the defined reach-and-balance task with four randomization strategies forms a representative benchmark for Sim2Real randomization effects rests on a single task whose dynamics (simple reach-and-balance) may not capture variance in contact-rich or higher-DoF manipulation; no justification or sensitivity analysis is supplied to support generalizability.

Authors: The manuscript explicitly frames the reach-and-balance task as an easy-to-reproduce benchmark to enable systematic comparisons of randomization strategies, addressing the field's reliance on customized setups rather than claiming broad representativeness across all manipulation domains. We will revise the abstract and experimental setup to add explicit justification for the task choice (simplicity and reproducibility) and a dedicated limitations paragraph discussing its scope, the absence of sensitivity analysis across task variants, and directions for extension to contact-rich or higher-DoF scenarios. revision: yes
Referee: [Results] Results section: the reported performance differences between randomization strategies are presented without error bars, statistical significance tests, or raw data, so the strength of the claim that fully randomized + fine-tuning 'translates better' cannot be assessed from the available evidence.

Authors: We agree that the results presentation would be strengthened by quantitative support for the observed differences. In the revised version we will add error bars to all performance plots, include statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values) comparing the randomization strategies, and release the raw experimental data together with the code repository to permit independent verification. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with external real-robot validation

full rationale

The paper defines a reach-and-balance task and four randomization strategies, then reports measured performance in simulation and on a physical robot. No equations, fitted parameters, or predictions are presented; results are direct experimental outcomes against an external benchmark (real hardware). No self-citation chains, ansatzes, or uniqueness claims underpin the central findings. The skeptic concern about task representativeness is a question of external validity, not circularity in the reported chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical robotics study; no mathematical derivations, free parameters, axioms, or invented entities are introduced or required by the central claim.

pith-pipeline@v0.9.0 · 5691 in / 1045 out tokens · 21313 ms · 2026-05-24T10:59:45.281495+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 3 internal anchors

[1]

Playing Atari with Deep Reinforce- ment Learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with Deep Reinforce- ment Learning,” in NIPS: Deep Learning Workshop , 2013

work page 2013
[2]

Mastering the Game of Go with Deep Neural Networks and Tree Search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature, vol....

work page 2016
[3]

Emergent Tool Use from Multi-Agent Autocurric- ula,

B. Baker, I. Kanitscheider, T. Markov, Y . Wu, G. Powell, B. McGrew, and I. Mordatch, “Emergent Tool Use from Multi-Agent Autocurric- ula,” in International Conference on Learning Representations (ICLR), ser. Eight, Virtual from Addis Ababa, Ethiopia, 2020

work page 2020
[4]

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection,

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421–436, 2018

work page 2018
[5]

OpenAI Gym

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv:1606.01540 [cs] , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning,

T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning,” in Conference on Robot Learning (CoRL), vol. 100. Virtual Event: PMLR, 2020, pp. 1094–1100

work page 2020
[7]

Isaac Gym: High Performance GPU Based Physics Simulation for Robot Learning,

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Mack- lin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac Gym: High Performance GPU Based Physics Simulation for Robot Learning,” in Conference on Neural Information Processing Systems (NeurIPS), ser. Datasets and Benchmarks Track, Virtual Event, 2021

work page 2021
[8]

Connecting Artiﬁcial Brains to Robots in a Comprehensive Simulation Framework: The Neurorobotics Platform,

E. Falotico, L. Vannucci, A. Ambrosano, U. Albanese, S. Ulbrich, J. C. Vasquez Tieck, G. Hinkel, J. Kaiser, I. Peric, O. Denninger, N. Cauli, M. Kirtay, A. Roennau, G. Klinker, A. V on Arnim, L. Guyot, D. Peppicelli, P. Mart ´ınez-Ca˜nada, E. Ros, P. Maier, S. Weber, M. Huber, D. Plecher, F. R ¨ohrbein, S. Deser, A. Roitberg, P. van der Smagt, R. Dillman,...

work page 2017
[9]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Vilamoura-Algarve, Portugal, 2012, pp. 5026–5033

work page 2012
[10]

Robust Sim2Real Transfer by Learning Inverse Dynamics of Simulated Sys- tems,

M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “Robust Sim2Real Transfer by Learning Inverse Dynamics of Simulated Sys- tems,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics , Corvallis, OR, USA, 2020, p. 3

work page 2020
[11]

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” in IEEE Symposium Series on Computational Intelligence (SSCI) , Canberra, ACT, Australia, 2020, pp. 737–744

work page 2020
[12]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Vancouver, BC, Canada, 2017, pp. 23–30

work page 2017
[13]

Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics,

N. Jakobi, P. Husbands, and I. Harvey, “Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics,” in European Conference on Artiﬁcial Life (ECAL) , ser. LNCS, vol. 929. Granada, Spain: Springer, 1995, pp. 704–720

work page 1995
[14]

Running Across the Reality Gap: Octopod Locomotion Evolved in a Minimal Simulation,

N. Jakobi, “Running Across the Reality Gap: Octopod Locomotion Evolved in a Minimal Simulation,” in European Workshop on Evolu- tionary Robotics (EvoRobots) , ser. LNCS, vol. 1468. Paris, France: Springer, 1998, pp. 39–58

work page 1998
[15]

Back to Reality: Crossing the Reality Gap in Evolutionary Robotics,

J. C. Zagal, J. Ruiz-del-Solar, and P. Vallejos, “Back to Reality: Crossing the Reality Gap in Evolutionary Robotics,”IFAC Proceedings Volumes, vol. 37, no. 8, pp. 834–839, 2004

work page 2004
[16]

The Transferability Ap- proach: Crossing the Reality Gap in Evolutionary Robotics,

S. Koos, J.-B. Mouret, and S. Doncieux, “The Transferability Ap- proach: Crossing the Reality Gap in Evolutionary Robotics,” IEEE Transactions on Evolutionary Computation , vol. 17, no. 1, pp. 122– 145, 2013

work page 2013
[17]

Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,

J. Josifovski, M. Kerzel, C. Pregizer, L. Posniak, and S. Wermter, “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Madrid, Spain, 2018, pp. 6269–6276

work page 2018
[18]

3D Simulation for Robot Arm Control with Deep Q-Learning,

S. James and E. Johns, “3D Simulation for Robot Arm Control with Deep Q-Learning,” in NIPS Workshop: Deep Learning for Action and Interaction, Barcelona, Spain, 2016

work page 2016
[19]

Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,

M. Kaspar, J. D. Mu ˜noz Osorio, and J. Bock, “Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV , USA, 2020, pp. 4383–4388

work page 2020
[20]

Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,” in IEEE International Conference on Robotics and Automation (ICRA) , Brisbane, QLD, Australia, 2018, pp. 3803–3810

work page 2018
[21]

Solving Rubik's Cube with a Robot Hand

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving Rubik’s Cube with a Robot Hand,” arXiv:1910.07113 [cs, stat] , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[22]

Sim-to-Real Via Sim-to-Sim: Data-Efﬁcient Robotic Grasping Via Randomized- to-Canonical Adaptation Networks,

S. James, P. Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan, J. Ibarz, S. Levine, R. Hadsell, and K. Bousmalis, “Sim-to-Real Via Sim-to-Sim: Data-Efﬁcient Robotic Grasping Via Randomized- to-Canonical Adaptation Networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, CA, USA, 2019, pp. 12 619–12 629

work page 2019
[23]

Robot Learning from Randomized Simulations: A Review,

F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Pe- ters, “Robot Learning from Randomized Simulations: A Review,” arXiv:2111.00956 [cs], 2022

work page arXiv 2022
[24]

Sim-to-Real Robot Learning from Pixels with Progressive Nets,

A. A. Rusu, M. Ve ˇcer´ık, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” in Annual Conference on Robot Learning (CoRL) , vol. 78. Mountain View, CA, USA: PMLR, 2017, pp. 262–270

work page 2017
[25]

Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?

A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, 2020, p. 3

work page 2020
[26]

Open ai reacher-v2 environment,

“Open ai reacher-v2 environment,” https://gym.openai.com/envs/ Reacher-v2/, accessed: 2022-2-23

work page 2022
[27]

Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals,

N. Navarro-Guerrero, R. Lowe, and S. Wermter, “Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals,” Frontiers in Neurorobotics, vol. 11, no. 10, 2017

work page 2017
[28]

Inter- action in Reinforcement Learning Reduces the Need for Finely Tuned Hyperparameters in Complex Tasks,

C. Stahlhut, N. Navarro-Guerrero, C. Weber, and S. Wermter, “Inter- action in Reinforcement Learning Reduces the Need for Finely Tuned Hyperparameters in Complex Tasks,” Kognitive Systeme, vol. 3, no. 2, 2015

work page 2015
[29]

Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,

J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics , Corvallis, OR, USA, 2020, p. 3

work page 2020
[30]

Unity 3d,

“Unity 3d,” https://unity.com/, accessed: 2020-6-26

work page 2020
[31]

Kuka lbr-iiwa,

“Kuka lbr-iiwa,” https://www.kuka.com/products/robot-systems/ industrial-robots/lbr-iiwa, accessed: 2020-6-26

work page 2020
[32]

Ros industrial,

“Ros industrial,” https://github.com/ros-industrial/kuka experimental, accessed: 2022-2-23

work page 2022
[33]

Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,

C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, 2017

work page 2017
[34]

Sim-to-Real: Learning Agile Locomotion for Quadruped Robots,

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke, “Sim-to-Real: Learning Agile Locomotion for Quadruped Robots,” in Robotics: Science and Systems (R:SS) , vol. 14, Pittsburgh, PA, USA, 2018

work page 2018
[35]

Unity robotics hub,

“Unity robotics hub,” https://github.com/Unity-Technologies/ Unity-Robotics-Hub/blob/main/tutorials/urdf importer/urdf appendix. md, accessed: 2022-2-23

work page 2022
[36]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” Tech. Rep. arXiv: 1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

Stable Baselines,

A. Hill, A. Rafﬁn, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y . Wu, “Stable Baselines,” 2018

work page 2018
[38]

Rl baselines zoo,

A. Rafﬁn, “Rl baselines zoo,” https://github.com/arafﬁn/ rl-baselines-zoo, 2018

work page 2018

[1] [1]

Playing Atari with Deep Reinforce- ment Learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with Deep Reinforce- ment Learning,” in NIPS: Deep Learning Workshop , 2013

work page 2013

[2] [2]

Mastering the Game of Go with Deep Neural Networks and Tree Search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the Game of Go with Deep Neural Networks and Tree Search,” Nature, vol....

work page 2016

[3] [3]

Emergent Tool Use from Multi-Agent Autocurric- ula,

B. Baker, I. Kanitscheider, T. Markov, Y . Wu, G. Powell, B. McGrew, and I. Mordatch, “Emergent Tool Use from Multi-Agent Autocurric- ula,” in International Conference on Learning Representations (ICLR), ser. Eight, Virtual from Addis Ababa, Ethiopia, 2020

work page 2020

[4] [4]

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection,

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 421–436, 2018

work page 2018

[5] [5]

OpenAI Gym

G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv:1606.01540 [cs] , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning,

T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning,” in Conference on Robot Learning (CoRL), vol. 100. Virtual Event: PMLR, 2020, pp. 1094–1100

work page 2020

[7] [7]

Isaac Gym: High Performance GPU Based Physics Simulation for Robot Learning,

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Mack- lin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, “Isaac Gym: High Performance GPU Based Physics Simulation for Robot Learning,” in Conference on Neural Information Processing Systems (NeurIPS), ser. Datasets and Benchmarks Track, Virtual Event, 2021

work page 2021

[8] [8]

Connecting Artiﬁcial Brains to Robots in a Comprehensive Simulation Framework: The Neurorobotics Platform,

E. Falotico, L. Vannucci, A. Ambrosano, U. Albanese, S. Ulbrich, J. C. Vasquez Tieck, G. Hinkel, J. Kaiser, I. Peric, O. Denninger, N. Cauli, M. Kirtay, A. Roennau, G. Klinker, A. V on Arnim, L. Guyot, D. Peppicelli, P. Mart ´ınez-Ca˜nada, E. Ros, P. Maier, S. Weber, M. Huber, D. Plecher, F. R ¨ohrbein, S. Deser, A. Roitberg, P. van der Smagt, R. Dillman,...

work page 2017

[9] [9]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Vilamoura-Algarve, Portugal, 2012, pp. 5026–5033

work page 2012

[10] [10]

Robust Sim2Real Transfer by Learning Inverse Dynamics of Simulated Sys- tems,

M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “Robust Sim2Real Transfer by Learning Inverse Dynamics of Simulated Sys- tems,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics , Corvallis, OR, USA, 2020, p. 3

work page 2020

[11] [11]

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” in IEEE Symposium Series on Computational Intelligence (SSCI) , Canberra, ACT, Australia, 2020, pp. 737–744

work page 2020

[12] [12]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Vancouver, BC, Canada, 2017, pp. 23–30

work page 2017

[13] [13]

Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics,

N. Jakobi, P. Husbands, and I. Harvey, “Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics,” in European Conference on Artiﬁcial Life (ECAL) , ser. LNCS, vol. 929. Granada, Spain: Springer, 1995, pp. 704–720

work page 1995

[14] [14]

Running Across the Reality Gap: Octopod Locomotion Evolved in a Minimal Simulation,

N. Jakobi, “Running Across the Reality Gap: Octopod Locomotion Evolved in a Minimal Simulation,” in European Workshop on Evolu- tionary Robotics (EvoRobots) , ser. LNCS, vol. 1468. Paris, France: Springer, 1998, pp. 39–58

work page 1998

[15] [15]

Back to Reality: Crossing the Reality Gap in Evolutionary Robotics,

J. C. Zagal, J. Ruiz-del-Solar, and P. Vallejos, “Back to Reality: Crossing the Reality Gap in Evolutionary Robotics,”IFAC Proceedings Volumes, vol. 37, no. 8, pp. 834–839, 2004

work page 2004

[16] [16]

The Transferability Ap- proach: Crossing the Reality Gap in Evolutionary Robotics,

S. Koos, J.-B. Mouret, and S. Doncieux, “The Transferability Ap- proach: Crossing the Reality Gap in Evolutionary Robotics,” IEEE Transactions on Evolutionary Computation , vol. 17, no. 1, pp. 122– 145, 2013

work page 2013

[17] [17]

Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,

J. Josifovski, M. Kerzel, C. Pregizer, L. Posniak, and S. Wermter, “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Madrid, Spain, 2018, pp. 6269–6276

work page 2018

[18] [18]

3D Simulation for Robot Arm Control with Deep Q-Learning,

S. James and E. Johns, “3D Simulation for Robot Arm Control with Deep Q-Learning,” in NIPS Workshop: Deep Learning for Action and Interaction, Barcelona, Spain, 2016

work page 2016

[19] [19]

Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,

M. Kaspar, J. D. Mu ˜noz Osorio, and J. Bock, “Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV , USA, 2020, pp. 4383–4388

work page 2020

[20] [20]

Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,” in IEEE International Conference on Robotics and Automation (ICRA) , Brisbane, QLD, Australia, 2018, pp. 3803–3810

work page 2018

[21] [21]

Solving Rubik's Cube with a Robot Hand

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving Rubik’s Cube with a Robot Hand,” arXiv:1910.07113 [cs, stat] , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[22] [22]

Sim-to-Real Via Sim-to-Sim: Data-Efﬁcient Robotic Grasping Via Randomized- to-Canonical Adaptation Networks,

S. James, P. Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan, J. Ibarz, S. Levine, R. Hadsell, and K. Bousmalis, “Sim-to-Real Via Sim-to-Sim: Data-Efﬁcient Robotic Grasping Via Randomized- to-Canonical Adaptation Networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Long Beach, CA, USA, 2019, pp. 12 619–12 629

work page 2019

[23] [23]

Robot Learning from Randomized Simulations: A Review,

F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Pe- ters, “Robot Learning from Randomized Simulations: A Review,” arXiv:2111.00956 [cs], 2022

work page arXiv 2022

[24] [24]

Sim-to-Real Robot Learning from Pixels with Progressive Nets,

A. A. Rusu, M. Ve ˇcer´ık, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” in Annual Conference on Robot Learning (CoRL) , vol. 78. Mountain View, CA, USA: PMLR, 2017, pp. 262–270

work page 2017

[25] [25]

Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?

A. Kadian, J. Truong, A. Gokaslan, A. Clegg, E. Wijmans, S. Lee, M. Savva, S. Chernova, and D. Batra, “Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, 2020, p. 3

work page 2020

[26] [26]

Open ai reacher-v2 environment,

“Open ai reacher-v2 environment,” https://gym.openai.com/envs/ Reacher-v2/, accessed: 2022-2-23

work page 2022

[27] [27]

Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals,

N. Navarro-Guerrero, R. Lowe, and S. Wermter, “Improving Robot Motor Learning with Negatively Valenced Reinforcement Signals,” Frontiers in Neurorobotics, vol. 11, no. 10, 2017

work page 2017

[28] [28]

Inter- action in Reinforcement Learning Reduces the Need for Finely Tuned Hyperparameters in Complex Tasks,

C. Stahlhut, N. Navarro-Guerrero, C. Weber, and S. Wermter, “Inter- action in Reinforcement Learning Reduces the Need for Finely Tuned Hyperparameters in Complex Tasks,” Kognitive Systeme, vol. 3, no. 2, 2015

work page 2015

[29] [29]

Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,

J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics , Corvallis, OR, USA, 2020, p. 3

work page 2020

[30] [30]

Unity 3d,

“Unity 3d,” https://unity.com/, accessed: 2020-6-26

work page 2020

[31] [31]

Kuka lbr-iiwa,

“Kuka lbr-iiwa,” https://www.kuka.com/products/robot-systems/ industrial-robots/lbr-iiwa, accessed: 2020-6-26

work page 2020

[32] [32]

Ros industrial,

“Ros industrial,” https://github.com/ros-industrial/kuka experimental, accessed: 2022-2-23

work page 2022

[33] [33]

Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,

C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, 2017

work page 2017

[34] [34]

Sim-to-Real: Learning Agile Locomotion for Quadruped Robots,

J. Tan, T. Zhang, E. Coumans, A. Iscen, Y . Bai, D. Hafner, S. Bohez, and V . Vanhoucke, “Sim-to-Real: Learning Agile Locomotion for Quadruped Robots,” in Robotics: Science and Systems (R:SS) , vol. 14, Pittsburgh, PA, USA, 2018

work page 2018

[35] [35]

Unity robotics hub,

“Unity robotics hub,” https://github.com/Unity-Technologies/ Unity-Robotics-Hub/blob/main/tutorials/urdf importer/urdf appendix. md, accessed: 2022-2-23

work page 2022

[36] [36]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” Tech. Rep. arXiv: 1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

Stable Baselines,

A. Hill, A. Rafﬁn, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y . Wu, “Stable Baselines,” 2018

work page 2018

[38] [38]

Rl baselines zoo,

A. Rafﬁn, “Rl baselines zoo,” https://github.com/arafﬁn/ rl-baselines-zoo, 2018

work page 2018