Continual Domain Randomization

Alois Knoll; Josip Josifovski; Justus Piater; Mohammadhossein Malmir; Nicol\'as Navarro-Guerrero; Sayantan Auddy

arxiv: 2403.12193 · v2 · submitted 2024-03-18 · 💻 cs.RO

Continual Domain Randomization

Josip Josifovski , Sayantan Auddy , Mohammadhossein Malmir , Justus Piater , Alois Knoll , Nicol\'as Navarro-Guerrero This is my paper

Pith reviewed 2026-05-24 03:02 UTC · model grok-4.3

classification 💻 cs.RO

keywords domain randomizationcontinual learningreinforcement learningsim-to-real transferroboticsgraspingreaching

0 comments

The pith

Continual Domain Randomization trains robotic policies by sequentially adding simulation parameter randomizations while using continual learning to retain effects from earlier stages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that reinforcement learning policies for reaching and grasping can be trained effectively by beginning in a non-randomized simulation and then progressing through a sequence of randomization subsets, with continual learning applied to preserve performance from prior stages. This sequential approach addresses the increased task difficulty that arises when all parameters are randomized together from the start. Experiments demonstrate that the resulting policies learn successfully in simulation and transfer to real robots with robustness that matches or exceeds both full combined randomization and sequential randomization without continual learning. A reader would care because the method provides a more flexible training path that still achieves reliable sim-to-real transfer for robotic control.

Core claim

By combining domain randomization with continual learning, a policy can be trained sequentially on subsets of randomization parameters starting from a non-randomized simulation; this yields effective learning in simulation and robust real-robot performance on reaching and grasping tasks that matches or outperforms baselines using combined randomization or sequential randomization without continual learning.

What carries the argument

Continual Domain Randomization (CDR), the mechanism that applies continual learning to retain the effects of previous randomization subsets while training on new parameter groups in sequence.

If this is right

Policies achieve robust real-world transfer without needing to solve the full randomization problem simultaneously.
Training begins in an easier non-randomized setting before complexity is added incrementally.
Continual learning preserves the benefits of each randomization stage for the final policy.
The method produces performance that is at least as good as standard domain randomization on the tested robotic tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sequential schedule could be applied to other manipulation or locomotion tasks to test whether the robustness gains generalize.
Different continual learning techniques might be substituted to determine which best retains randomization effects without additional hyperparameter tuning.
The order in which parameter subsets are introduced may influence final policy quality and could be optimized as a separate design choice.

Load-bearing premise

That continual learning methods can reliably prevent catastrophic forgetting of prior randomization effects when new parameter subsets are introduced sequentially, preserving policy performance across the training sequence.

What would settle it

A direct comparison experiment in which the CDR-trained policy exhibits clearly worse real-robot success rates or measurable forgetting of earlier randomization effects than a combined-randomization baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2403.12193 by Alois Knoll, Josip Josifovski, Justus Piater, Mohammadhossein Malmir, Nicol\'as Navarro-Guerrero, Sayantan Auddy.

**Figure 2.** Figure 2: The simulated and real environments for reaching and grasping. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Effects of different randomization parameters on sim2real transfer [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Training progress for the grasping task. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of the EWC regularization constant [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases the task difficulty and might result in sub-optimal policies. To address this problem and to provide a more flexible training process, we propose Continual Domain Randomization (CDR) for RL that combines domain randomization with continual learning to enable sequential training in simulation on a subset of randomization parameters at a time. Starting from a model trained in a non-randomized simulation where the task is easier to solve, the model is trained on a sequence of randomizations, and continual learning is employed to remember the effects of previous randomizations. Our robotic reaching and grasping tasks experiments show that the model trained in this fashion learns effectively in simulation and performs robustly on the real robot while matching or outperforming baselines that employ combined randomization or sequential randomization without continual learning. Our code and videos are available at https://continual-dr.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CDR splits multi-parameter domain randomization into sequential stages with continual learning to keep training tractable, and the real-robot reaching/grasping results match the combined-randomization baseline, but retention of earlier stages is not shown.

read the letter

The core idea is to avoid the difficulty of randomizing many simulator parameters at once by training the RL policy on subsets in sequence, starting from a deterministic sim and using continual learning to retain the effects of prior randomizations. This is presented as a practical fix for sim-to-real transfer in robotics when full simultaneous DR makes the task too hard.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Continual Domain Randomization (CDR), which augments domain randomization for RL policies by training sequentially on subsets of randomization parameters, starting from a non-randomized simulation and employing continual learning to retain effects of prior randomizations. Experiments on robotic reaching and grasping tasks indicate that CDR policies learn effectively in simulation and transfer robustly to the real robot, matching or outperforming baselines that use combined randomization or sequential randomization without continual learning.

Significance. If the empirical results hold with proper verification of retention, CDR offers a more flexible alternative to simultaneous multi-parameter randomization, potentially yielding higher-performing policies for sim-to-real transfer by reducing task difficulty during training.

major comments (2)

[Abstract, Experiments] Abstract and Experiments section: The central claim that CDR matches or outperforms the combined-randomization and sequential-without-CL baselines on real-robot reaching/grasping requires that the continual-learning component actually preserves policy performance on earlier randomization subsets after later subsets are introduced. No diagnostic metrics (e.g., return on task-1 after completing task-3), algorithm name, regularization strength, or replay-buffer size are reported, rendering the advantage dependent on an unverified retention property.
[Experiments] Experiments section: No quantitative metrics, learning curves, or statistical significance tests are provided for the reported positive results versus baselines, limiting assessment of whether the observed robustness is reliable or merely anecdotal.

minor comments (1)

[Abstract] The abstract states that code and videos are available at a URL, but the manuscript should include a brief description of the randomization schedule and task sequence to allow readers to understand the experimental protocol without external resources.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript to incorporate additional verification and quantitative analysis.

read point-by-point responses

Referee: [Abstract, Experiments] Abstract and Experiments section: The central claim that CDR matches or outperforms the combined-randomization and sequential-without-CL baselines on real-robot reaching/grasping requires that the continual-learning component actually preserves policy performance on earlier randomization subsets after later subsets are introduced. No diagnostic metrics (e.g., return on task-1 after completing task-3), algorithm name, regularization strength, or replay-buffer size are reported, rendering the advantage dependent on an unverified retention property.

Authors: We agree that explicit verification of retention is important for substantiating the advantage of CDR. The manuscript will be revised to include diagnostic metrics such as policy returns on earlier randomization subsets (e.g., task-1) after training on later subsets. We will also specify the continual learning algorithm used, regularization strength, and any replay buffer details to allow full verification of the retention property. revision: yes
Referee: [Experiments] Experiments section: No quantitative metrics, learning curves, or statistical significance tests are provided for the reported positive results versus baselines, limiting assessment of whether the observed robustness is reliable or merely anecdotal.

Authors: We acknowledge the lack of quantitative support in the current version. The revised manuscript will include learning curves from simulation training, quantitative real-robot performance metrics with variability measures (e.g., standard deviation across runs), and statistical significance tests comparing CDR against the baselines to demonstrate reliability of the results. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical method paper

full rationale

The paper proposes Continual Domain Randomization by combining domain randomization with continual learning for sequential parameter randomization in RL, then evaluates the approach via reaching and grasping experiments on simulated and real robots against combined-randomization and sequential-without-CL baselines. No mathematical derivation, fitted parameters renamed as predictions, or self-referential definitions appear; the central claim rests on direct experimental comparison rather than any reduction to inputs by construction. No load-bearing self-citations or ansatzes are present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, invented entities, or detailed axioms are extractable. The central claim rests on the unstated effectiveness of continual learning for this sequential randomization setting.

axioms (1)

domain assumption Continual learning techniques can mitigate forgetting across sequential additions of domain randomization parameters
The method depends on this to maintain performance when moving from one randomization subset to the next.

pith-pipeline@v0.9.0 · 5753 in / 1180 out tokens · 33427 ms · 2026-05-24T03:02:20.243246+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 5 internal anchors

[1]

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” in IEEE Symposium Series on Computational Intelligence (SSCI) , Canberra, ACT, Australia, Dec. 2020, pp. 737–744

work page 2020
[2]

Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,

G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,” Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021

work page 2021
[3]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2017, pp. 23–30

work page 2017
[4]

Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manip- ulation Tasks,

J. Josifovski, M. Malmir, N. Klarmann, B. L. Zagar, N. Navarro- Guerrero, and A. Knoll, “Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manip- ulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Kyoto, Japan: IEEE, Oct. 2022, pp. 10 193–10 200

work page 2022
[5]

Continual Lifelong Learning with Neural Networks: A Review,

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual Lifelong Learning with Neural Networks: A Review,” Neural Networks, vol. 113, pp. 54–71, May 2019

work page 2019
[6]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms, Tech. Rep. arXiv: 1707.06347, July 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Overcoming Catastrophic Forgetting in Neural Networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming Catastrophic Forgetting in Neural Networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

work page 2017
[8]

Progress & Compress: A Scalable Framework for Continual Learning,

J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & Compress: A Scalable Framework for Continual Learning,” in International Conference on Machine Learning (ICML) , vol. 80. PMLR, 2018, pp. 4528–4537

work page 2018
[9]

3D Simulation for Robot Arm Control with Deep Q-Learning,

S. James and E. Johns, “3D Simulation for Robot Arm Control with Deep Q-Learning,” in NIPS Workshop: Deep Learning for Action and Interaction, Barcelona, Spain, Dec. 2016

work page 2016
[10]

Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,

M. Kaspar, J. D. Mu ˜noz Osorio, and J. Bock, “Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, NV , USA: IEEE, Oct. 2020, pp. 4383–4388

work page 2020
[11]

Robot Learning from Randomized Simulations: A Review,

F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot Learning from Randomized Simulations: A Review,”Frontiers in Robotics and AI , vol. 9, no. 799893, Apr. 2022

work page 2022
[12]

Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,

J. Josifovski, M. Kerzel, C. Pregizer, L. Posniak, and S. Wermter, “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Madrid, Spain, Oct. 2018, pp. 6269–6276

work page 2018
[13]

Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,” in IEEE International Conference on Robotics and Automation (ICRA) , Brisbane, QLD, Australia, 2018, pp. 3803–3810, iSSN: 2577-087X

work page 2018
[14]

Solving Rubik's Cube with a Robot Hand

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving Rubik’s Cube with a Robot Hand, Tech. Rep. arXiv:1910.07113, Oct. 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[15]

Active Domain Randomization,

B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 100. Osaka, Japan: PMLR, May 2020, pp. 1162–1176

work page 2020
[16]

BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,

F. Ramos, R. Possas, and D. Fox, “BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,” in Robotics: Science and Systems (R:SS) , vol. 15, June 2019

work page 2019
[17]

Neural Posterior Domain Randomization,

F. Muratore, T. Gruner, F. Wiese, B. Belousov, M. Gienger, and J. Peters, “Neural Posterior Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 164. PMLR, Nov. 2021, pp. 1532– 1542

work page 2021
[18]

What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,

P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,” in Conference on Robot Learning (CoRL), Atlanta, GA, USA, Aug. 2023

work page 2023
[19]

DROPO: Sim-to-Real Transfer with Offline Domain Randomization,

G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-Real Transfer with Offline Domain Randomization,” Robotics and Autonomous Sys- tems, vol. 166, p. 104432, Aug. 2023

work page 2023
[20]

Hypernetwork-PPO for Continual Reinforcement Learning,

P. Sch ¨opf, S. Auddy, J. Hollenstein, and A. Rodr ´ıguez-S´anchez, “Hypernetwork-PPO for Continual Reinforcement Learning,” in Deep Reinforcement Learning Workshop NeurIPS 2022 , Dec. 2022

work page 2022
[21]

Continual Learning from Demonstration of Robotics Skills,

S. Auddy, J. Hollenstein, M. Saveriano, A. Rodr ´ıguez-S´anchez, and J. Piater, “Continual Learning from Demonstration of Robotics Skills,” Robotics and Autonomous Systems , vol. 165, p. 104427, 2023

work page 2023
[22]

——, “Scalable and Efficient Continual Learning from Demonstration Via a Hypernetwork-Generated Stable Dynamics Model, Tech. Rep. arXiv:2311.03600, Jan. 2024, eprint: 2311.03600

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Towards Continual Reinforcement Learning: A Review and Perspectives,

K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards Continual Reinforcement Learning: A Review and Perspectives,” Journal of Artificial Intelligence Research, vol. 75, pp. 1401–1476, Dec. 2022

work page 2022
[24]

Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards,

Y . Luo, Y . Wang, K. Dong, Q. Zhang, E. Cheng, Z. Sun, and B. Song, “Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards,” Neurocomputing, vol. 557, p. 126620, Nov. 2023

work page 2023
[25]

Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,

J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, July 2020, p. 3

work page 2020
[26]

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive Neural Networks, Tech. Rep. arXiv:1606.04671, Oct. 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[27]

Sim-to-Real Robot Learning from Pixels with Progressive Nets,

A. A. Rusu, M. Ve ˇcer´ık, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” in Annual Conference on Robot Learning (CoRL) , vol. 78. Mountain View, CA, USA: PMLR, 2017, pp. 262–270, iSSN: 2640- 3498

work page 2017
[28]

Policy Distillation

A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirk- patrick, R. Pascanu, V . Mnih, K. Kavukcuoglu, and R. Hadsell, “Policy Distillation,” in International Conference on Learning Representations (ICLR). San Juan, Puerto Rico: arXiv, May 2016, eprint: 1511.06295

work page internal anchor Pith review Pith/arXiv arXiv 2016
[29]

Continual Reinforcement Learning Deployed in Real- Life Using Policy Distillation and Sim2real Transfer,

R. Traor ´e, H. Caselles-Dupr ´e, T. Lesort, T. Sun, N. D ´ıaz-Rodr´ıguez, and D. Filliat, “Continual Reinforcement Learning Deployed in Real- Life Using Policy Distillation and Sim2real Transfer,” in ICML Work- shop on Multi-Task and Lifelong Learning . arXiv, June 2019

work page 2019
[30]

UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,

S. Kessler, J. Parker-Holder, P. Ball, S. Zohren, and S. J. Roberts, “UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,” in ICML Workshop on Continual Learning , vol. 108. Vienna, Austria: PMLR, 2020

work page 2020
[31]

Safety-Oriented Stability Biases for Continual Learning,

A. Gaurav, “Safety-Oriented Stability Biases for Continual Learning,” Master’s thesis, University of Waterloo, 2020

work page 2020
[32]

Robotiq Gripper

“Robotiq Gripper.” [Online]. Available: http://robotiq.com/products/ industrial-robot-hand/

work page
[33]

A Comparison of Action Spaces for Learning Manipulation Tasks,

P. Varin, L. Grossman, and S. Kuindersma, “A Comparison of Action Spaces for Learning Manipulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Macau,China, 2019, pp. 6015–6021

work page 2019
[34]

Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,

Y . R. St ¨urz, L. M. Affolter, and R. S. Smith, “Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,”IFAC-PapersOnLine, vol. 50, no. 1, pp. 6863–6868, 2017

work page 2017
[35]

ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,

S. Edwards and C. Lewis, “ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,” in ECHORD Workshop at the IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 2012

work page 2012
[36]

Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,

C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, Feb. 2017

work page 2017
[37]

ROS: An Open-Source Robot Operating System,

M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, and A. Ng, “ROS: An Open-Source Robot Operating System,” in ICRA Workshop on Open Source Software , vol. 3, 2009, p. 6

work page 2009
[38]

Stable Baselines,

A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y . Wu, “Stable Baselines,” 2018

work page 2018
[39]

Smooth Exploration for Robotic Reinforcement Learning,

A. Raffin, J. Kober, and F. Stulp, “Smooth Exploration for Robotic Reinforcement Learning,” in Conference on Robot Learning (CoRL) , vol. 164. London, UK: PMLR, Jan. 2022, pp. 1634–1644

work page 2022
[40]

Malmir, J

M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “DiAReL: Rein- forcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control, Tech. Rep. arXiv:2306.09010, 2023

work page arXiv 2023
[41]

Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization,

Y . Kadokawa, L. Zhu, Y . Tsurumine, and T. Matsubara, “Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization,” Robotics and Autonomous Systems , vol. 165, p. 104425, 2023

work page 2023

[1] [1]

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” in IEEE Symposium Series on Computational Intelligence (SSCI) , Canberra, ACT, Australia, Dec. 2020, pp. 737–744

work page 2020

[2] [2]

Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,

G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,” Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021

work page 2021

[3] [3]

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2017, pp. 23–30

work page 2017

[4] [4]

Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manip- ulation Tasks,

J. Josifovski, M. Malmir, N. Klarmann, B. L. Zagar, N. Navarro- Guerrero, and A. Knoll, “Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manip- ulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Kyoto, Japan: IEEE, Oct. 2022, pp. 10 193–10 200

work page 2022

[5] [5]

Continual Lifelong Learning with Neural Networks: A Review,

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual Lifelong Learning with Neural Networks: A Review,” Neural Networks, vol. 113, pp. 54–71, May 2019

work page 2019

[6] [6]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms, Tech. Rep. arXiv: 1707.06347, July 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Overcoming Catastrophic Forgetting in Neural Networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming Catastrophic Forgetting in Neural Networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

work page 2017

[8] [8]

Progress & Compress: A Scalable Framework for Continual Learning,

J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & Compress: A Scalable Framework for Continual Learning,” in International Conference on Machine Learning (ICML) , vol. 80. PMLR, 2018, pp. 4528–4537

work page 2018

[9] [9]

3D Simulation for Robot Arm Control with Deep Q-Learning,

S. James and E. Johns, “3D Simulation for Robot Arm Control with Deep Q-Learning,” in NIPS Workshop: Deep Learning for Action and Interaction, Barcelona, Spain, Dec. 2016

work page 2016

[10] [10]

Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,

M. Kaspar, J. D. Mu ˜noz Osorio, and J. Bock, “Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, NV , USA: IEEE, Oct. 2020, pp. 4383–4388

work page 2020

[11] [11]

Robot Learning from Randomized Simulations: A Review,

F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot Learning from Randomized Simulations: A Review,”Frontiers in Robotics and AI , vol. 9, no. 799893, Apr. 2022

work page 2022

[12] [12]

Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,

J. Josifovski, M. Kerzel, C. Pregizer, L. Posniak, and S. Wermter, “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Madrid, Spain, Oct. 2018, pp. 6269–6276

work page 2018

[13] [13]

Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,” in IEEE International Conference on Robotics and Automation (ICRA) , Brisbane, QLD, Australia, 2018, pp. 3803–3810, iSSN: 2577-087X

work page 2018

[14] [14]

Solving Rubik's Cube with a Robot Hand

OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving Rubik’s Cube with a Robot Hand, Tech. Rep. arXiv:1910.07113, Oct. 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[15] [15]

Active Domain Randomization,

B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 100. Osaka, Japan: PMLR, May 2020, pp. 1162–1176

work page 2020

[16] [16]

BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,

F. Ramos, R. Possas, and D. Fox, “BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,” in Robotics: Science and Systems (R:SS) , vol. 15, June 2019

work page 2019

[17] [17]

Neural Posterior Domain Randomization,

F. Muratore, T. Gruner, F. Wiese, B. Belousov, M. Gienger, and J. Peters, “Neural Posterior Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 164. PMLR, Nov. 2021, pp. 1532– 1542

work page 2021

[18] [18]

What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,

P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,” in Conference on Robot Learning (CoRL), Atlanta, GA, USA, Aug. 2023

work page 2023

[19] [19]

DROPO: Sim-to-Real Transfer with Offline Domain Randomization,

G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-Real Transfer with Offline Domain Randomization,” Robotics and Autonomous Sys- tems, vol. 166, p. 104432, Aug. 2023

work page 2023

[20] [20]

Hypernetwork-PPO for Continual Reinforcement Learning,

P. Sch ¨opf, S. Auddy, J. Hollenstein, and A. Rodr ´ıguez-S´anchez, “Hypernetwork-PPO for Continual Reinforcement Learning,” in Deep Reinforcement Learning Workshop NeurIPS 2022 , Dec. 2022

work page 2022

[21] [21]

Continual Learning from Demonstration of Robotics Skills,

S. Auddy, J. Hollenstein, M. Saveriano, A. Rodr ´ıguez-S´anchez, and J. Piater, “Continual Learning from Demonstration of Robotics Skills,” Robotics and Autonomous Systems , vol. 165, p. 104427, 2023

work page 2023

[22] [22]

——, “Scalable and Efficient Continual Learning from Demonstration Via a Hypernetwork-Generated Stable Dynamics Model, Tech. Rep. arXiv:2311.03600, Jan. 2024, eprint: 2311.03600

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Towards Continual Reinforcement Learning: A Review and Perspectives,

K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards Continual Reinforcement Learning: A Review and Perspectives,” Journal of Artificial Intelligence Research, vol. 75, pp. 1401–1476, Dec. 2022

work page 2022

[24] [24]

Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards,

Y . Luo, Y . Wang, K. Dong, Q. Zhang, E. Cheng, Z. Sun, and B. Song, “Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards,” Neurocomputing, vol. 557, p. 126620, Nov. 2023

work page 2023

[25] [25]

Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,

J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, July 2020, p. 3

work page 2020

[26] [26]

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive Neural Networks, Tech. Rep. arXiv:1606.04671, Oct. 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[27] [27]

Sim-to-Real Robot Learning from Pixels with Progressive Nets,

A. A. Rusu, M. Ve ˇcer´ık, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” in Annual Conference on Robot Learning (CoRL) , vol. 78. Mountain View, CA, USA: PMLR, 2017, pp. 262–270, iSSN: 2640- 3498

work page 2017

[28] [28]

Policy Distillation

A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirk- patrick, R. Pascanu, V . Mnih, K. Kavukcuoglu, and R. Hadsell, “Policy Distillation,” in International Conference on Learning Representations (ICLR). San Juan, Puerto Rico: arXiv, May 2016, eprint: 1511.06295

work page internal anchor Pith review Pith/arXiv arXiv 2016

[29] [29]

Continual Reinforcement Learning Deployed in Real- Life Using Policy Distillation and Sim2real Transfer,

R. Traor ´e, H. Caselles-Dupr ´e, T. Lesort, T. Sun, N. D ´ıaz-Rodr´ıguez, and D. Filliat, “Continual Reinforcement Learning Deployed in Real- Life Using Policy Distillation and Sim2real Transfer,” in ICML Work- shop on Multi-Task and Lifelong Learning . arXiv, June 2019

work page 2019

[30] [30]

UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,

S. Kessler, J. Parker-Holder, P. Ball, S. Zohren, and S. J. Roberts, “UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,” in ICML Workshop on Continual Learning , vol. 108. Vienna, Austria: PMLR, 2020

work page 2020

[31] [31]

Safety-Oriented Stability Biases for Continual Learning,

A. Gaurav, “Safety-Oriented Stability Biases for Continual Learning,” Master’s thesis, University of Waterloo, 2020

work page 2020

[32] [32]

Robotiq Gripper

“Robotiq Gripper.” [Online]. Available: http://robotiq.com/products/ industrial-robot-hand/

work page

[33] [33]

A Comparison of Action Spaces for Learning Manipulation Tasks,

P. Varin, L. Grossman, and S. Kuindersma, “A Comparison of Action Spaces for Learning Manipulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Macau,China, 2019, pp. 6015–6021

work page 2019

[34] [34]

Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,

Y . R. St ¨urz, L. M. Affolter, and R. S. Smith, “Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,”IFAC-PapersOnLine, vol. 50, no. 1, pp. 6863–6868, 2017

work page 2017

[35] [35]

ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,

S. Edwards and C. Lewis, “ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,” in ECHORD Workshop at the IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 2012

work page 2012

[36] [36]

Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,

C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, Feb. 2017

work page 2017

[37] [37]

ROS: An Open-Source Robot Operating System,

M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, and A. Ng, “ROS: An Open-Source Robot Operating System,” in ICRA Workshop on Open Source Software , vol. 3, 2009, p. 6

work page 2009

[38] [38]

Stable Baselines,

A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y . Wu, “Stable Baselines,” 2018

work page 2018

[39] [39]

Smooth Exploration for Robotic Reinforcement Learning,

A. Raffin, J. Kober, and F. Stulp, “Smooth Exploration for Robotic Reinforcement Learning,” in Conference on Robot Learning (CoRL) , vol. 164. London, UK: PMLR, Jan. 2022, pp. 1634–1644

work page 2022

[40] [40]

Malmir, J

M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “DiAReL: Rein- forcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control, Tech. Rep. arXiv:2306.09010, 2023

work page arXiv 2023

[41] [41]

Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization,

Y . Kadokawa, L. Zhu, Y . Tsurumine, and T. Matsubara, “Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization,” Robotics and Autonomous Systems , vol. 165, p. 104425, 2023

work page 2023