pith. sign in

arxiv: 2403.12193 · v2 · submitted 2024-03-18 · 💻 cs.RO

Continual Domain Randomization

Pith reviewed 2026-05-24 03:02 UTC · model grok-4.3

classification 💻 cs.RO
keywords domain randomizationcontinual learningreinforcement learningsim-to-real transferroboticsgraspingreaching
0
0 comments X

The pith

Continual Domain Randomization trains robotic policies by sequentially adding simulation parameter randomizations while using continual learning to retain effects from earlier stages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that reinforcement learning policies for reaching and grasping can be trained effectively by beginning in a non-randomized simulation and then progressing through a sequence of randomization subsets, with continual learning applied to preserve performance from prior stages. This sequential approach addresses the increased task difficulty that arises when all parameters are randomized together from the start. Experiments demonstrate that the resulting policies learn successfully in simulation and transfer to real robots with robustness that matches or exceeds both full combined randomization and sequential randomization without continual learning. A reader would care because the method provides a more flexible training path that still achieves reliable sim-to-real transfer for robotic control.

Core claim

By combining domain randomization with continual learning, a policy can be trained sequentially on subsets of randomization parameters starting from a non-randomized simulation; this yields effective learning in simulation and robust real-robot performance on reaching and grasping tasks that matches or outperforms baselines using combined randomization or sequential randomization without continual learning.

What carries the argument

Continual Domain Randomization (CDR), the mechanism that applies continual learning to retain the effects of previous randomization subsets while training on new parameter groups in sequence.

If this is right

  • Policies achieve robust real-world transfer without needing to solve the full randomization problem simultaneously.
  • Training begins in an easier non-randomized setting before complexity is added incrementally.
  • Continual learning preserves the benefits of each randomization stage for the final policy.
  • The method produces performance that is at least as good as standard domain randomization on the tested robotic tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sequential schedule could be applied to other manipulation or locomotion tasks to test whether the robustness gains generalize.
  • Different continual learning techniques might be substituted to determine which best retains randomization effects without additional hyperparameter tuning.
  • The order in which parameter subsets are introduced may influence final policy quality and could be optimized as a separate design choice.

Load-bearing premise

That continual learning methods can reliably prevent catastrophic forgetting of prior randomization effects when new parameter subsets are introduced sequentially, preserving policy performance across the training sequence.

What would settle it

A direct comparison experiment in which the CDR-trained policy exhibits clearly worse real-robot success rates or measurable forgetting of earlier randomization effects than a combined-randomization baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2403.12193 by Alois Knoll, Josip Josifovski, Justus Piater, Mohammadhossein Malmir, Nicol\'as Navarro-Guerrero, Sayantan Auddy.

Figure 1
Figure 1. Figure 1: Overview of our proposed CDR approach. CDR- [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The simulated and real environments for reaching and grasping. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effects of different randomization parameters on sim2real transfer [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training progress for the grasping task. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of the EWC regularization constant [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases the task difficulty and might result in sub-optimal policies. To address this problem and to provide a more flexible training process, we propose Continual Domain Randomization (CDR) for RL that combines domain randomization with continual learning to enable sequential training in simulation on a subset of randomization parameters at a time. Starting from a model trained in a non-randomized simulation where the task is easier to solve, the model is trained on a sequence of randomizations, and continual learning is employed to remember the effects of previous randomizations. Our robotic reaching and grasping tasks experiments show that the model trained in this fashion learns effectively in simulation and performs robustly on the real robot while matching or outperforming baselines that employ combined randomization or sequential randomization without continual learning. Our code and videos are available at https://continual-dr.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Continual Domain Randomization (CDR), which augments domain randomization for RL policies by training sequentially on subsets of randomization parameters, starting from a non-randomized simulation and employing continual learning to retain effects of prior randomizations. Experiments on robotic reaching and grasping tasks indicate that CDR policies learn effectively in simulation and transfer robustly to the real robot, matching or outperforming baselines that use combined randomization or sequential randomization without continual learning.

Significance. If the empirical results hold with proper verification of retention, CDR offers a more flexible alternative to simultaneous multi-parameter randomization, potentially yielding higher-performing policies for sim-to-real transfer by reducing task difficulty during training.

major comments (2)
  1. [Abstract, Experiments] Abstract and Experiments section: The central claim that CDR matches or outperforms the combined-randomization and sequential-without-CL baselines on real-robot reaching/grasping requires that the continual-learning component actually preserves policy performance on earlier randomization subsets after later subsets are introduced. No diagnostic metrics (e.g., return on task-1 after completing task-3), algorithm name, regularization strength, or replay-buffer size are reported, rendering the advantage dependent on an unverified retention property.
  2. [Experiments] Experiments section: No quantitative metrics, learning curves, or statistical significance tests are provided for the reported positive results versus baselines, limiting assessment of whether the observed robustness is reliable or merely anecdotal.
minor comments (1)
  1. [Abstract] The abstract states that code and videos are available at a URL, but the manuscript should include a brief description of the randomization schedule and task sequence to allow readers to understand the experimental protocol without external resources.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript to incorporate additional verification and quantitative analysis.

read point-by-point responses
  1. Referee: [Abstract, Experiments] Abstract and Experiments section: The central claim that CDR matches or outperforms the combined-randomization and sequential-without-CL baselines on real-robot reaching/grasping requires that the continual-learning component actually preserves policy performance on earlier randomization subsets after later subsets are introduced. No diagnostic metrics (e.g., return on task-1 after completing task-3), algorithm name, regularization strength, or replay-buffer size are reported, rendering the advantage dependent on an unverified retention property.

    Authors: We agree that explicit verification of retention is important for substantiating the advantage of CDR. The manuscript will be revised to include diagnostic metrics such as policy returns on earlier randomization subsets (e.g., task-1) after training on later subsets. We will also specify the continual learning algorithm used, regularization strength, and any replay buffer details to allow full verification of the retention property. revision: yes

  2. Referee: [Experiments] Experiments section: No quantitative metrics, learning curves, or statistical significance tests are provided for the reported positive results versus baselines, limiting assessment of whether the observed robustness is reliable or merely anecdotal.

    Authors: We acknowledge the lack of quantitative support in the current version. The revised manuscript will include learning curves from simulation training, quantitative real-robot performance metrics with variability measures (e.g., standard deviation across runs), and statistical significance tests comparing CDR against the baselines to demonstrate reliability of the results. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical method paper

full rationale

The paper proposes Continual Domain Randomization by combining domain randomization with continual learning for sequential parameter randomization in RL, then evaluates the approach via reaching and grasping experiments on simulated and real robots against combined-randomization and sequential-without-CL baselines. No mathematical derivation, fitted parameters renamed as predictions, or self-referential definitions appear; the central claim rests on direct experimental comparison rather than any reduction to inputs by construction. No load-bearing self-citations or ansatzes are present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, invented entities, or detailed axioms are extractable. The central claim rests on the unstated effectiveness of continual learning for this sequential randomization setting.

axioms (1)
  • domain assumption Continual learning techniques can mitigate forgetting across sequential additions of domain randomization parameters
    The method depends on this to maintain performance when moving from one randomization subset to the next.

pith-pipeline@v0.9.0 · 5753 in / 1180 out tokens · 33427 ms · 2026-05-24T03:02:20.243246+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 5 internal anchors

  1. [1]

    Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,

    W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” in IEEE Symposium Series on Computational Intelligence (SSCI) , Canberra, ACT, Australia, Dec. 2020, pp. 737–744

  2. [2]

    Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,

    G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,” Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021

  3. [3]

    Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2017, pp. 23–30

  4. [4]

    Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manip- ulation Tasks,

    J. Josifovski, M. Malmir, N. Klarmann, B. L. Zagar, N. Navarro- Guerrero, and A. Knoll, “Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manip- ulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Kyoto, Japan: IEEE, Oct. 2022, pp. 10 193–10 200

  5. [5]

    Continual Lifelong Learning with Neural Networks: A Review,

    G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual Lifelong Learning with Neural Networks: A Review,” Neural Networks, vol. 113, pp. 54–71, May 2019

  6. [6]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms, Tech. Rep. arXiv: 1707.06347, July 2017

  7. [7]

    Overcoming Catastrophic Forgetting in Neural Networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming Catastrophic Forgetting in Neural Networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

  8. [8]

    Progress & Compress: A Scalable Framework for Continual Learning,

    J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & Compress: A Scalable Framework for Continual Learning,” in International Conference on Machine Learning (ICML) , vol. 80. PMLR, 2018, pp. 4528–4537

  9. [9]

    3D Simulation for Robot Arm Control with Deep Q-Learning,

    S. James and E. Johns, “3D Simulation for Robot Arm Control with Deep Q-Learning,” in NIPS Workshop: Deep Learning for Action and Interaction, Barcelona, Spain, Dec. 2016

  10. [10]

    Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,

    M. Kaspar, J. D. Mu ˜noz Osorio, and J. Bock, “Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, NV , USA: IEEE, Oct. 2020, pp. 4383–4388

  11. [11]

    Robot Learning from Randomized Simulations: A Review,

    F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot Learning from Randomized Simulations: A Review,”Frontiers in Robotics and AI , vol. 9, no. 799893, Apr. 2022

  12. [12]

    Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,

    J. Josifovski, M. Kerzel, C. Pregizer, L. Posniak, and S. Wermter, “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Madrid, Spain, Oct. 2018, pp. 6269–6276

  13. [13]

    Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,” in IEEE International Conference on Robotics and Automation (ICRA) , Brisbane, QLD, Australia, 2018, pp. 3803–3810, iSSN: 2577-087X

  14. [14]

    Solving Rubik's Cube with a Robot Hand

    OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving Rubik’s Cube with a Robot Hand, Tech. Rep. arXiv:1910.07113, Oct. 2019

  15. [15]

    Active Domain Randomization,

    B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 100. Osaka, Japan: PMLR, May 2020, pp. 1162–1176

  16. [16]

    BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,

    F. Ramos, R. Possas, and D. Fox, “BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,” in Robotics: Science and Systems (R:SS) , vol. 15, June 2019

  17. [17]

    Neural Posterior Domain Randomization,

    F. Muratore, T. Gruner, F. Wiese, B. Belousov, M. Gienger, and J. Peters, “Neural Posterior Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 164. PMLR, Nov. 2021, pp. 1532– 1542

  18. [18]

    What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,

    P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,” in Conference on Robot Learning (CoRL), Atlanta, GA, USA, Aug. 2023

  19. [19]

    DROPO: Sim-to-Real Transfer with Offline Domain Randomization,

    G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-Real Transfer with Offline Domain Randomization,” Robotics and Autonomous Sys- tems, vol. 166, p. 104432, Aug. 2023

  20. [20]

    Hypernetwork-PPO for Continual Reinforcement Learning,

    P. Sch ¨opf, S. Auddy, J. Hollenstein, and A. Rodr ´ıguez-S´anchez, “Hypernetwork-PPO for Continual Reinforcement Learning,” in Deep Reinforcement Learning Workshop NeurIPS 2022 , Dec. 2022

  21. [21]

    Continual Learning from Demonstration of Robotics Skills,

    S. Auddy, J. Hollenstein, M. Saveriano, A. Rodr ´ıguez-S´anchez, and J. Piater, “Continual Learning from Demonstration of Robotics Skills,” Robotics and Autonomous Systems , vol. 165, p. 104427, 2023

  22. [22]

    ——, “Scalable and Efficient Continual Learning from Demonstration Via a Hypernetwork-Generated Stable Dynamics Model, Tech. Rep. arXiv:2311.03600, Jan. 2024, eprint: 2311.03600

  23. [23]

    Towards Continual Reinforcement Learning: A Review and Perspectives,

    K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards Continual Reinforcement Learning: A Review and Perspectives,” Journal of Artificial Intelligence Research, vol. 75, pp. 1401–1476, Dec. 2022

  24. [24]

    Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards,

    Y . Luo, Y . Wang, K. Dong, Q. Zhang, E. Cheng, Z. Sun, and B. Song, “Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards,” Neurocomputing, vol. 557, p. 126620, Nov. 2023

  25. [25]

    Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,

    J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, July 2020, p. 3

  26. [26]

    A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive Neural Networks, Tech. Rep. arXiv:1606.04671, Oct. 2016

  27. [27]

    Sim-to-Real Robot Learning from Pixels with Progressive Nets,

    A. A. Rusu, M. Ve ˇcer´ık, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” in Annual Conference on Robot Learning (CoRL) , vol. 78. Mountain View, CA, USA: PMLR, 2017, pp. 262–270, iSSN: 2640- 3498

  28. [28]

    Policy Distillation

    A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirk- patrick, R. Pascanu, V . Mnih, K. Kavukcuoglu, and R. Hadsell, “Policy Distillation,” in International Conference on Learning Representations (ICLR). San Juan, Puerto Rico: arXiv, May 2016, eprint: 1511.06295

  29. [29]

    Continual Reinforcement Learning Deployed in Real- Life Using Policy Distillation and Sim2real Transfer,

    R. Traor ´e, H. Caselles-Dupr ´e, T. Lesort, T. Sun, N. D ´ıaz-Rodr´ıguez, and D. Filliat, “Continual Reinforcement Learning Deployed in Real- Life Using Policy Distillation and Sim2real Transfer,” in ICML Work- shop on Multi-Task and Lifelong Learning . arXiv, June 2019

  30. [30]

    UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,

    S. Kessler, J. Parker-Holder, P. Ball, S. Zohren, and S. J. Roberts, “UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,” in ICML Workshop on Continual Learning , vol. 108. Vienna, Austria: PMLR, 2020

  31. [31]

    Safety-Oriented Stability Biases for Continual Learning,

    A. Gaurav, “Safety-Oriented Stability Biases for Continual Learning,” Master’s thesis, University of Waterloo, 2020

  32. [32]

    Robotiq Gripper

    “Robotiq Gripper.” [Online]. Available: http://robotiq.com/products/ industrial-robot-hand/

  33. [33]

    A Comparison of Action Spaces for Learning Manipulation Tasks,

    P. Varin, L. Grossman, and S. Kuindersma, “A Comparison of Action Spaces for Learning Manipulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Macau,China, 2019, pp. 6015–6021

  34. [34]

    Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,

    Y . R. St ¨urz, L. M. Affolter, and R. S. Smith, “Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,”IFAC-PapersOnLine, vol. 50, no. 1, pp. 6863–6868, 2017

  35. [35]

    ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,

    S. Edwards and C. Lewis, “ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,” in ECHORD Workshop at the IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 2012

  36. [36]

    Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,

    C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, Feb. 2017

  37. [37]

    ROS: An Open-Source Robot Operating System,

    M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, and A. Ng, “ROS: An Open-Source Robot Operating System,” in ICRA Workshop on Open Source Software , vol. 3, 2009, p. 6

  38. [38]

    Stable Baselines,

    A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y . Wu, “Stable Baselines,” 2018

  39. [39]

    Smooth Exploration for Robotic Reinforcement Learning,

    A. Raffin, J. Kober, and F. Stulp, “Smooth Exploration for Robotic Reinforcement Learning,” in Conference on Robot Learning (CoRL) , vol. 164. London, UK: PMLR, Jan. 2022, pp. 1634–1644

  40. [40]

    Malmir, J

    M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “DiAReL: Rein- forcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control, Tech. Rep. arXiv:2306.09010, 2023

  41. [41]

    Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization,

    Y . Kadokawa, L. Zhu, Y . Tsurumine, and T. Matsubara, “Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization,” Robotics and Autonomous Systems , vol. 165, p. 104425, 2023